[Sebuah sistem tanya jawab (STJ) adalah sebuah sistem komputer yang dirancanguntuk mencari jawaban yang paling tepat terhadap sebuah pertanyaan yangdiajukan dalam sebuah bahasa alami. Penelitian terkait STJ telah dilakukan sejakawal tahun 60-an, dan mengalami perkembangan yang pesat sejak diadakannyaforum-forum evaluasi STJ sejak tahun 90-an sampai saat ini. Bidang-bidangpenelitian dalam ilmu komputer yang memberikan kontribusi besar dalamperkembangan STJ meliputi antara lain: temu balik informasi, pemrosesan bahasaalami, dan kecerdasan buatan.Secara khusus dalam riset doktoral ini dilakukan eksplorasi terhadapkomponen validasi jawaban. Riset bertujuan untuk menghasilkan metode baruyang dapat meningkatkan relevansi cuplikan teks dan mencari strategi untukmelakukan ekstraksi jawaban dengan mengkombinasikan pendekatan statist ik dansimbolik. Terdapat dua usulan yang diberikan guna mencapai tujuan riset. Usulyang pertama adalah penggunaan model kualitas jawaban yang dikembangkandari STJ berbasis komunitas sebagai alat untuk melakukan pengurutan ulangcuplikan teks. Usul yang kedua adalah pembentukan model jawaban melaluipembelajaran frasa pengandung jawaban terkecil dan terlengkap (leastgeneralized answer bearing phrase/ABP-LG) sebagai sarana untuk memprediksibagian kalimat yang paling memungkinkan mengandung jawaban. Model ABPLGmemanfaatkan informasi struktur kalimat pada pertanyaan dan cuplikan tekssebagai indikator yang menentukan peluang kandungan jawaban dalam sebuahbagian kalimat.Hasil eksperimen dengan berbagai koleksi data memperlihatkan bahwakombinasi model ABP-LG dengan sistem berbasis pola mampu memberikankontribusi untuk perbaikan hasil ekstraksi jawaban secara signifikan untuk tipepertanyaan faktoid maupun kompleks (tipe lain-lain). Keunggulan model ABP-LGjika dibandingkan dengan STJ berbasis entitas bernama ataupun kamus adalahkemampuannya untuk mempelajari indikasi 'cara menjawab' dan portabilitasnyauntuk diterapkan dalam domain pertanyaan yang berbeda-beda, khususnya untuktipe-tipe pertanyaan yang dapat mencakup konteks apapun, seperti dalam tipe'other' (lain-lain). Kelemahan model ABP-LG yang teramati selama eksperimenadalah ketergantungannya pada kualitas teks. Problem terakhir ini secara parsialberhasil ditangani oleh model pengurutan ulang cuplikan teks sebagai penyaringkandidat-kandidat kalimat yang dianggap mengandung jawaban dari hasil temubalik informasi.;The task of a question answering system (QAS) is to find a final answer given anatural language question. Since it was introduced in the 1960s, the task of QAShas always been at the forefront of technology advances. Along with the advancesin the fields of information retrieval, computational linguistics, and artificialintelligence, research on QAS are broadened into unstructured textual documentsin open domains. Evaluation forums for QAS have steered the development of QASinto an established and large-scale research methodologies and evaluations.This doctoral research investigates various techniques in the answervalidation component. The main objective of the research is to develop newmethods in snippet reranking and answer extraction process by combining thestatistical and the symbolic (semantics) approaches. Two novel techniques areproposed as the results of this doctoral research. The first one is the snippets'reranking model which is developed by using the question-answer pairs'characteristics in a community-based QAS. This answer quality model forms thebasic ingredient for the snippet reranking process. The second proposal is the leastgeneralized answer bearing phrase model (ABP-LG) to predict the final answerlocation of a given question which is extracted from a number of good qualitysnippets, after a reranking process. The ABP-LG model employs syntactic treeinformation of question-answer (snippet) pairs as indicators to predict the answerbearing possibility in each part of a snippet.The experiment results show that the ABP-LG model combines with thepattern-based approach contributes considerably in the answer extraction processfor factoid- and complex (other)-typed questions. The main advantage of the ABPLGmodel beyond the common approaches, which are based on named-entityrecognizers or dictionaries, is its ability to predict the 'way-of-answering', either infactoid or complex question types. Based on the analysis of the experimentresults, the main weaknesses of the ABP-LG model is its high dependency ongood quality snippets which partially has been tackled by employing the snippets'reranking model., The task of a question answering system (QAS) is to find a final answer given anatural language question. Since it was introduced in the 1960s, the task of QAShas always been at the forefront of technology advances. Along with the advancesin the fields of information retrieval, computational linguistics, and artificialintelligence, research on QAS are broadened into unstructured textual documentsin open domains. Evaluation forums for QAS have steered the development of QASinto an established and large-scale research methodologies and evaluations.This doctoral research investigates various techniques in the answervalidation component. The main objective of the research is to develop newmethods in snippet reranking and answer extraction process by combining thestatistical and the symbolic (semantics) approaches. Two novel techniques areproposed as the results of this doctoral research. The first one is the snippets'reranking model which is developed by using the question-answer pairs'characteristics in a community-based QAS. This answer quality model forms thebasic ingredient for the snippet reranking process. The second proposal is the leastgeneralized answer bearing phrase model (ABP-LG) to predict the final answerlocation of a given question which is extracted from a number of good qualitysnippets, after a reranking process. The ABP-LG model employs syntactic treeinformation of question-answer (snippet) pairs as indicators to predict the answerbearing possibility in each part of a snippet.The experiment results show that the ABP-LG model combines with thepattern-based approach contributes considerably in the answer extraction processfor factoid- and complex (other)-typed questions. The main advantage of the ABPLGmodel beyond the common approaches, which are based on named-entityrecognizers or dictionaries, is its ability to predict the 'way-of-answering', either infactoid or complex question types. Based on the analysis of the experimentresults, the main weaknesses of the ABP-LG model is its high dependency ongood quality snippets which partially has been tackled by employing the snippets'reranking model.] |