The Development of Digital Corpus as Media and Learning Resources for Language Research in Era 4.0

Published by admin on

Agus Syahrani, Dosen Fakultas Keguruan dan Ilmu Pendidikan Universitas Tanjungpura,
Adit Dermawan, Peneliti Surau Intellectual for Conservation,
Nur Ahmad Salman Herbowo, Peneliti Surau Intellectual for Conservation

The development of technology has had a major impact on the world of education, especially in research methods. In higher education, one of the increasingly relevant courses is the Computerized Research Methods of Linguistic Corpus, where the digital corpus is an important focus. According to several researchers (Walsh, 2018; Hishamuddin, 2019; Taufiqurrahman, 2021), the development of a digital corpus through research is very much needed because the data contained in it is authentic and complex (Gries, 2009; Batubara, 2018). The digital corpus also has the advantage of being processed using computer applications and digital technology (Walker, 2011; Baker, 2012; Lee, 2018).

This research contributes to achieving the Tanjungpura University Research Master Plan for 2020-2024 in the leading field of digital learning services, especially related to media and learning resources in Era 4.0. With a focus on information and communication technology (ICT)-based media and learning resources. This study aims to develop a digital corpus as a learning medium that can improve the abilities of students and lecturers in the field of linguistics.

Several previous researchers (Bata, 2019; Rajeg, 2020; Hidayat et al., 2019) have used corpus applications for language research. However, the use of corpus applications in the academic environment of Tanjungpura University, by lecturers, students, and researchers, is still very minimal. In fact, digital corpus and corpus applications are very important to support language research.

Currently, there are several corpus applications available to the public, such as the Indonesian Corpus managed by the Ministry of Education and Culture, Sealang Library Indonesia, Webcorp, Corpora Collection Leipzig, and Antconc. However, each application has shortcomings, especially in providing data that is in accordance with the target research topic. For example, the existing data is still general and limited and does not support more specific target corpus analysis.

To overcome these limitations, a new application called “Korpus Nusantara” was developed. This application has the advantage of a web-based platform that can be accessed online, contains corpora relevant to research needs, and provides corpus analysis tools directly in the application. Thus, this application can function as a more effective media and learning resource for language researchers.

The specific objectives of this study include the development of a prototype corpus application as a learning medium, the reconstruction of a digital corpus in the field of communication, prototype trials, and further development in 2024 and 2025. In addition, this study will also examine lexical aspects in the field of communication using a corpus application as a media and learning resource.

The urgency of this research is based on the need to develop an Indonesian language corpus application based on digital technology. The scope of this study is part of the RIP UNTAN 2015-2039, with the leading theme of media and learning resources in Era 4.0 and ICT-based research topics. This research will also produce outputs in the form of international journal articles and ISBN books to support the achievement of the main performance indicators (IKU) of Tanjungpura University. In the early stages, this research is at TKT level 3, namely proof of concepts, functions, and important characteristics through analysis and experiments. With this research, it is hoped that the Korpus Nusantara application can be a significant innovation in the development of media and learning resources in higher education, especially in the field of digital-based linguistic research.


Baker P. Acceptable bias? Using corpus linguistics methods with critical discourse analysis. Crit Discourse Stud. 2012;9(3):247–56.

Bata J. #AkuGalau: Korpus bahasa Indonesia untuk deteksi emosi dari teks. J Elektro. 2019;12(2):103–10.

Batubara IA, Wariyati. Analisis korpus bahasa inggris sebagai masukan bagi korpus bahasa Indonesia. J Penelit Pendidik Bhs Dan Sastra. 2018;2(2):178–84.

Gries ST. What is corpus linguistics? Linguist Lang Compass. 2009;3(5):1225–41.

Hidayat H, Saifullah AR. Analisis tanggapan pengguna youtube terhadap pidato Presiden Joko Widodo: Analisis wacana berbasis korpus. In: Seminar Internasional Riksa … [Internet]. 2019. p. 407–16. Available from: http://proceedings2.upi.edu/index.php/riksabahasa/article/view/896.

Hishamudin Isam, Mashetoh Abd Mutalib. Pemanfaatan analisis korpus sebagai teknik alternatif pengajaran dan pembelajaran tatabahasa. Int J Lang Educ Appl Linguist. 2019;9(1):13–31.

Lee H, Warschauer M, Lee JH. The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Appl Linguist. 2018;1–34.

Rajeg GPW, Rajeg IM. Pemahaman kuantitatif dasar dan penerapannya dalam mengkaji keterkaitan antara bentuk dan makna. Linguist Indones. 2019;37(1):13–31.

Taufiqurrahman F. Penggunaan konjungsi sebagai representasi penalaran: sebuah kajian korpus bahasa di bidang pendidikan. J Pesona. 2021;6(1):1–19.

Utami NPCP. Analisis ragam bahasa istilah dalam iklan pariwisata di media digital pada masa pandemi covid-19. JOURNEY. 2021;4(1):19–24.

Walker CP. A corpus-based study of the linguistic features and processes which influence the way collocations are formed: Some implications for the learning of collocations. TESOL Q. 2011;45(2):291–312.

Walsh S. Applying corpus linguistics and conversation analysis in the investigation of small group teaching in higher education BT – Working with text and around text in foreign language environments. 2016; 1:205–22. Available from: https://doi.org/10.1007/978-3-319-33272-7_13.

Categories: Artikel

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *