Hung H, Piske T, Perez Toro PA, Arias Vergara T, Maier A (2025)
Publication Language: English
Publication Type: Journal article
Publication year: 2025
Book Volume: 59
Pages Range: 3117-3138
DOI: 10.1007/s10579-025-09851-2
Narrative skills are crucial for young children as they not only indicate literacy and academic performance but also serve as effective tools to foster children’s relationships with the world. However, the linguistic resources for narratives produced by bilingual children are often limited, posing major challenges to the fields of child language development and language resource studies. Moreover, with the increasing prevalence of remote data collection, there are few guidelines on how to collect such data remotely. In this context, we present kidsNARRATE. KidsNARRATE is a non-native speech corpus designed to study the narrative comprehension of Chinese-English bilingual children in their L2 English. KidsNARRATE comprises 6 hours of audio recordings of children taking the narrative test Multilingual Instrument for Narratives (MAIN), along with transcriptions, human-rated scores, and annotations of grammatical and pronunciation errors at the word level. The audio recordings of the English section have been processed to meet the requirements of certain machine learning applications. Additionally, for cognitive baseline comparison, kidsNARRATE contains the audio and video data of the same group of children taking the parallel MAIN test in L1 Chinese. In the course of this study, we developed a remote recording method using accessible recording tools and an easy-to-use setup. Despite its simplicity, the data collected using this method meets the rigorous requirements for machine learning studies and is also suitable for linguistic research. This method can serve as a specific template for researchers and educators seeking to remotely record audio and/or video data for linguistic studies. Overall, the rich linguistic content and compatibility with machine learning processes make kidsNARRATE a valuable resource for studies of early child L2 acquisition and the development of children’s speech patterns in the field of automatic speech recognition. Finally, we propose future work regarding data collection methods and second language teaching.
APA:
Hung, H., Piske, T., Perez Toro, P.A., Arias Vergara, T., & Maier, A. (2025). kidsNARRATE: a versatile corpus for studying Chinese-english bilingual L2 narrative skills in preschoolers. Language Resources and Evaluation, 59, 3117-3138. https://doi.org/10.1007/s10579-025-09851-2
MLA:
Hung, Hiuching, et al. "kidsNARRATE: a versatile corpus for studying Chinese-english bilingual L2 narrative skills in preschoolers." Language Resources and Evaluation 59 (2025): 3117-3138.
BibTeX: Download