Generating artificial error data for indonesian preposition error corrections

Budi Irmawati, author

Generating artificial error data for indonesian preposition error corrections

Hiroyuki Shindo, Yuji Matsumoto (Faculty of Engineering, Universitas Indonesia, 2017)

Abstract

Large-scale annotated data written by second language learners are not always available for low-resource languages such as Indonesian. To cope with data scarcity, it is important to generate ‘learner-like’ artificial error sentences when the available real learner data is insufficient and language experts cannot construct data. In this paper, we propose a new method for generating effective error-injected artificial data to proliferate training examples for preposition error correction tasks. Our method first generates a large scale of noisy artificial error data via the use of a simple error injection method. It then selectively removes the uninformative (noisy) instances from the artificial data. We assume that ‘good’ artificial preposition error data would be effective training data for error correction tasks. Therefore, to evaluate the goodness of the generated artificial data, we used the generated artificial data as training data to correct preposition errors in real learners’ sentences. The results of our study indicate that the use of our artificial data for training improves preposition error correction performance. The results also show that training on a smaller sized of good instances outperforms training on much larger-sized noisy instances as well as that on sentences written by native speakers. This method is language-independent and easy to apply to other low-resource languages because it assumes only a small size of learner error data and uses features that could be extracted automatically from linguistically annotated sentences.

Keyword

artificial data

indonesian language

low-resourced languages

noise removal

preposition error correction

Metadata

Collection Type :	Artikel Jurnal
Call Number :	UI-IJTECH 8:3 (2017)
Main entry-Personal name :	Budi Irmawati, author






Subject :	Proliferation Outplacement services
Publishing :	Depok: Faculty of Engineering, Universitas Indonesia, 2017

Cataloguing Source :	LibUI eng rda
ISSN :	20869614
Magazine/Journal :	International Journal of Technology
Volume :	Vol. 8, No. 3, April 2017: Hal. 549-558
Content Type :	text
Media Type :	unmediated
Carrier Type :	volume
Electronic Access :	https://doi.org/10.14716/ijtech.v8i3.4825
Holding Company :	Universitas Indonesia
Location :	Perpustakaan UI, Lantai 4 R. Koleksi Jurnal

Availability
Review
Cover

Call Number	Barcode Number	Availability
UI-IJTECH 8:3 (2017)	08-23-74584452	TERSEDIA

Review:

No review available for this collection: 9999920533902

Artikel Jurnal :: Back

Artikel Jurnal :: Back

Generating artificial error data for indonesian preposition error corrections

Abstract

Keyword

Metadata