Automatic Robust Rule-Based Phonetization of Standard Arabic

Sindran F, Mualla F, Bobzin K, Nöth E (2015)


Publication Status: Published

Publication Type: Conference contribution

Publication year: 2015

Journal

Publisher: Springer-verlag

Book Volume: 9302

Pages Range: 442-451

DOI: 10.1007/978-3-319-24033-6_50

Abstract

Phonetization is the process of encoding language sounds using phonetic symbols. It is used in many natural language processing tasks such as speech processing, speech synthesis, and computer-aided pronunciation assessment. A common phonetization approach is the use of letter-to-sound rules developed by linguists for the transcription form orthography to sound. In this paper, we address the problem of rule-based phonetization of standard Arabic. The paper contributions can be summarized as follows: 1) Discussing the transcription rules of standard Arabic which were used in literature on the phonemic and phonetic levels. 2) Important improvements of these rules were suggested and the resulting rules set was tested on large datasets. 3) We present a reliable automatic phonetic transcription of standard Arabic on five levels: phoneme, allophone, syllable, word, and sentence. An encoding which covers all sounds of standard Arabic is proposed and several pronunciation dictionaries were automatically generated. These dictionaries were manually verified yielding an accuracy of 100% with standard Arabic texts that do not contain dates, numbers, acronyms, abbreviations, and special symbols. They are available for research purposes along with the software package which performs the automatic transcription.

Authors with CRIS profile

How to cite

APA:

Sindran, F., Mualla, F., Bobzin, K., & Nöth, E. (2015). Automatic Robust Rule-Based Phonetization of Standard Arabic. (pp. 442-451). Springer-verlag.

MLA:

Sindran, Fadi, et al. "Automatic Robust Rule-Based Phonetization of Standard Arabic." Springer-verlag, 2015. 442-451.

BibTeX: Download