Artificial Intelligence for Preserving Indonesian Regional Languages: Machine Learning Approaches to Documenting Endangered Dialects and Cultural Linguistic Heritage
Keywords:
artificial intelligence, machine learning, indonesian regional language, cultural heritageAbstract
Indonesia, as one of the world's most linguistically diverse nations, faces a critical challenge in preserving its regional languages amid rapid globalization and the dominance of Bahasa Indonesia and English. With over 700 living languages, many of which are classified as endangered or vulnerable, there is an urgent need for innovative approaches to documentation, revitalization, and intergenerational transmission. This study explores the potential of Artificial Intelligence (AI) and Machine Learning (ML) technologies as transformative tools for preserving Indonesian regional languages and their associated cultural linguistic heritage. Through a systematic literature review of 68 publications from 2020-2025, this research analyzes current applications of AI in language documentation, identifies technological approaches including Natural Language Processing (NLP), Automatic Speech Recognition (ASR), neural machine translation, and deep learning models, and examines case studies of AI-driven language preservation initiatives globally and in Indonesia. The study reveals that AI technologies offer unprecedented capabilities for large-scale documentation through automated transcription and annotation of oral traditions, creation of digital dictionaries and corpora for low-resource languages, development of language learning applications with speech recognition and feedback systems, and preservation of intangible cultural heritage embedded in linguistic expressions. However, implementation faces significant challenges including data scarcity for training AI models in low-resource language contexts, technical limitations in processing complex phonological and morphological features of Austronesian languages, ethical concerns regarding data sovereignty and community consent in digital archiving, and the digital divide limiting access to AI tools in remote indigenous communities.
Keywords : artificial intelligence, machine learning, language preservation, endangered languages, Indonesian regional languages, natural language processing, cultural heritage documentation









