QazLLM vs. ChatGPT: Linguistic Similarities and Advantages in Conveying the Kazakh Ethnocultural Code
DOI:
10.26577/EJPh2022202612Abstract
The rapid development of Large Language Models has expanded the range of Kazakh-language digital services; however, the heavy reliance of many systems on English-centric data often hampers the accurate rendering of nationally marked content, most notably the Kazakh ethnocultural code. This paper presents a comparative linguistic analysis of QazLLM and ChatGPT, examining their capacity to interpret Kazakh cultural-historical, legal, and idiomatic contexts.
The study is timely given the growing demand in public services, education, and media for Kazakh-language generation that is culturally appropriate and pragmatically robust. The empirical basis comprises 140,000 expert-annotated examples, and the results are additionally compared with OpenAI systems and solutions developed at Nazarbayev University (ISSAI).
Model outputs were evaluated against indicators including recognition of culture-specific realia, motivated explanations of phraseological units and fixed expressions, accurate delivery of evaluative meaning, and adherence to dialogic norms. A complementary error typology was constructed, covering cultural overgeneralization, concept substitution, and superficial contextual interpretation.
The analysis indicates that QazLLM more often aligns culturally specific notions, phraseology, and evaluative nuances with the surrounding context, whereas ChatGPT tends to be more consistent on general informational queries. The study’s scientific contribution lies in refining linguistically grounded criteria for measuring cultural relevance and in substantiating the role of localized corpora in culture-sensitive AI. Practically, the findings can inform the improvement of Kazakh-language chatbots, instructional and methodological materials, translation assistants, and cultural reference services.
Keywords: large language model, artificial intelligence, Kazakh language, cultural context, transformer, data corpus, localization.








