Hasil Pencarian

Ditemukan 113479 dokumen yang sesuai dengan query

Kenneth Jonathan

Re-Ranker Berbasis Fitur untuk Information Retrieval pada Domain Legal = Feature Based Re-Ranker for Legal Domain Retrieval

"Terdapat beberapa masalah yang muncul seiring dengan bertambahnya peraturan. Hal tersebut menyebabkan proses pengumpulan dan evaluasi peraturan memakan waktu yang relatif lebih lama. Oleh karena itu, diperlukan suatu sistem yang dapat mengotomatiskan kebutuhan tersebut, salah satunya adalah Information Retrieval. Penelitian ini bertujuan untuk meningkatkan efektivitas sistem Information Retrieval melalui pendekatan re-ranker berbasis fitur dengan memanfaatkan beberapa jenis fitur, seperti atribut kuantitatif sederhana, skor text matching, dan document embeddings. Ditemukan bahwa skor kesamaan Jaccard, nilai relevansi BM25 dan nilai relevansi LemurTF_IDF merupakan karakteristik yang dapat membantu peningkatan efektivitas re-ranking secara konsisten dalam domain legal. Sementara itu, fitur yang memanfaatkan embeddings dari BERT maupun T5 didapatkan bermanfaat, namun memiliki kontribusi yang lebih kecil dari fitur perhitungan sederhana seperti kesamaan Jaccard. Selain itu, didapatkan bahwa pemanfaatan seluruh fitur sebagai masukan dari re-ranker LambdaMART dapat meningkatkan seluruh metrik sistem sekitar 4,17% secara signifikan dengan nilai metrik utama, recall@3, tertinggi diperoleh DLH13 (Reranker) dengan nilai 0,6632 dan peningkatan sebesar 5,64%. Namun, saat dilakukan percobaan menggunakan hanya ketiga fitur tersebut, didapatkan peningkatan sebesar 3, 739% yang tidak signifikan.

There are several issues that arise with the increasing number of regulations. This causes the process of collecting and evaluating regulations to take relatively longer. Therefore, a system is needed to automate these needs, one of which is Information Retrieval. This research aims to improve the effectiveness of the Information Retrieval system through a feature-based re-ranker approach by utilizing several types of features, such as simple quantitative attributes, text matching scores, and document embeddings. It was found that Jaccard similarity scores, BM25 relevance values, and LemurTF_IDF relevance values are characteristics that can consistently help improve re-ranking effectiveness in the legal domain. Meanwhile, features that utilize BERT and T5 embeddings were found to be beneficial but contributed less than simple calculation features like Jaccard similarity. Additionally, it was found that using all the features as input for the LambdaMART re-ranker can significantly improve all system metrics by about 4,17%, with the highest main metric value, recall@3, achieved by DLH13 (Reranker) with a value of 0, 6632 and an increase of 5,64%. However, when experiments were conducted using only the three features mentioned, an insignificant increase of 3, 739% was obtained."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Kenneth Jonathan

Re-Ranker Berbasis Fitur untuk Information Retrieval pada Domain Legal = Feature Based Re-Ranker for Legal Domain Retrieval

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dimas Ichsanul Arifin

Temu-balik Dokumen Hukum dengan Model Neural Re-Ranker = Legal Document Retrieval with Neural Re-Ranker Model

"Volume data hukum yang dihasilkan semakin besar setiap harinya. Oleh karena itu, kebutuhan akan sistem otomatis dan semi-otomatis, seperti sistem temu-balik informasi meningkat. Sistem temu-balik informasi dokumen hukum membantu praktisi hukum menemukan dokumen yang relevan dengan cepat dan efisien. Terkait hal tersebut, penelitian ini mengeksplorasi penggunaan model neural re-ranker pada sistem temu-balik informasi dokumen hukum dalam bahasa Inggris dan bahasa Indonesia. Tidak hanya itu, penelitian ini juga membahas beberapa pendekatan untuk meningkatkan efektivitas proses fine-tune dari model neural re-ranker. Model neural re-ranker dikembangkan untuk melakukan pengurutan ulang terhadap hasil pencarian awal yang didapat dari model pencocokan teks BM25. Implementasi ini menggunakan beberapa model neural re-ranker seperti BERT, IndoBERT, mBERT, dan XLM-RoBERTa yang melalui proses fine-tune. Hasil eksperimen menunjukkan bahwa model neural re-ranker BERT, IndoBERT, dan mBERT dapat meningkatkan performa dari sistem temu-balik informasi dokumen hukum yang sebelumnya hanya memanfaatkan model berbasis pencocokan teks seperti TF-IDF dan BM25. Peningkatan ini terlihat dari Skor Mean Average Percision (MAP) yang meningkat dari 0,760 menjadi 0,834 pada salah satu skenario yang dilakukan. Hal ini menunjukkan kinerja keseluruhan sistem temu-balik informasi yang lebih baik pada berbagai kueri. Sementara itu, pendekatan berupa pembekuan lapisan encoder berguna untuk meningkatkan efektifitas dari implementasi dari sistem temu-balik informasi yang memanfaatkan model neural re-ranker.

The volume of legal data produced is getting bigger every day. Therefore, the need for automated and semi-automatic systems, such as information retrieval systems, is increasing. Legal document information retrieval systems help legal practitioners find relevant documents quickly and efficiently. In this regard, this research explores the use of the neural re-ranker model in a legal document information retrieval system in English and Indonesian. Not only that, this research also discusses several approaches to increase the effectiveness of the fine-tune process of the neural re-ranker model. A neural re-ranker model was developed to re-rank the initial search results obtained from the BM25 text matching model. This implementation uses several neural re-ranker models such as BERT, IndoBERT, mBERT, and XLM-RoBERTa which go through a fine-tune process. Experimental results show that the neural re-ranker models BERT, IndoBERT, and mBERT can improve the performance of legal document information retrieval systems that previously only utilized text matching-based models such as TF-IDF and BM25. This improvement can be seen from the Mean Average Percision (MAP) score which increased from 0,760 to 0,834 in one of the scenarios carried out. This indicates better overall performance of the information retrieval system on various queries. Meanwhile, the approach of freezing the encoder layer is useful for increasing the effectiveness of the implementation of an information retrieval system that utilizes the neural re-ranker model."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Ridzki Dira Putra

Data cleansing pada PT. XXX untuk data asset under construction = Data cleansing at PT. XXX for asset under construction data

"ABSTRAK

Laporan magang ini membahas mengenai praktek penulis dalam melakukan data cleansing pada PT. XXX untuk data asset under construction. Selama menjalankan magang, penulis membantu melakukan perbaikan data asset under construction milik PT. XXX. Terdapat dua kesalahan dalam data asset under construction PT. XXX, yaitu deskripsi data melebihi 50 karakter dan field asset class tidak valid dikarenakan masih kosong. Penulis melakukan perbaikan data deskripsi dengan melakukan penyingkatan kata-kata dalam deskripsi. Dalam penyingkatan, penulis menggunakan beberapa fungsi seperti fungsi LEN dan fitur conditional formatting dalam Microsoft Excel. kesalahan pada field asset class belum diperbaiki karena merupakan field tambahan yang belum ada pada sistem sebelumnya.

ABSTRACT

This internship report discuss about author practice data cleansing at PT. XXX for asset under construction data. During internship program, author helps repair PT. XXX asset under construction data. There are two errors in the asset under construction data. The description field more than 50 character and asset class is invalid because the field still empty. Author did repair the description by making abbreviation for general word. Author use some Microsoft Excel function like LEN function and contional formatting. Asset class error has not been fixed because this field was not implemented in the old system."

2019

TA-Pdf

UI - Tugas Akhir Universitas Indonesia Library

Ezra Pasha Ramadhansyah

Neural Re-Ranker untuk Mengidentifikasi Pertanyaan Serupa pada Forum Kesehatan Berbahasa Indonesia = Neural Re-Rankers to Identify Duplicate Questions in Indonesian Health Forums

"Sistem perolehan pertanyaan serupa diimplementasikan pada banyak situs tanya jawab, khususnya pada forum tanya jawab kesehatan. Implementasi dari sistem pencarian pertanyaan serupa dapat beragam seperti text based retriever dan neural ranker. Permasalahan utama dari neural ranker adalah kurangnya penelitian dalam bahasa indonesia untuk modelnya, khususnya untuk yang menggunakan BERT sebagai model untuk deteksi pertanyaan serupa. Pada penelitian ini akan dicari tahu sejauh apa neural re-ranker BERT dapat memperbaiki kualitas ranking dari text-based retriever jika diterapkan fine-tuning pada model. Model yang digunakan oleh penelitian berupa BERT dan test collection yang digunakan merupakan dataset forum kesehatan yang disusun oleh Nurhayati (2019). Untuk mengetahui sejauh mana model berbasis BERT dapat berguna untuk re-ranking, eksperimen dilakukan pada model pre-trained multilingualBERT, indoBERT, stevenWH, dan distilBERT untuk melihat model yang terbaik untuk di-fine-tune. Penelitian juga mengusulkan dua metode fine-tuning yakni attention mask filter dengan IDF dan freezed layer dengan melakukan freezing pada beberapa layer di dalam BERT. Model dan metode ini kemudian diuji pada beberapa skenario yang telah ditentukan. Hasil dari eksperimen menunjukkan bahwa re-ranker dapat meningkatkan kualitas text based retriever bila di-fine-tune dengan metode dan skenario tertentu.

Beberapa model memberikan hasil yang lebih baik dengan dataset forum kesehatan dan dengan text based retriever BM25 dan TF-IDF. Model multilingualBERT dan metode fine-tuning layer freezing memberikan hasil yang terbaik dari semua kombinasi. Kenaikan tertinggi terdapat pada kombinasi BM25 dan multilingualBERT dengan layer freezing dengan kenaikan sebesar 0.051 dibandingkan BM25.

The system of acquiring similar questions is implemented on many Question and Answering sites, including health forums. Implementations of similar question search systems can vary, such as text-based retrievers and neural rankers. The main issue with neural rankers is the lack of research in Indonesian language for neural ranker models, especially those using BERT. This study aims to investigate how far BERT as a neural re-ranker can improve the ranking quality of a text-based retriever when applied with fine-tuning. The model used in this research is BERT, and the test collection used is a health forum dataset compiled by Nurhayati (2019). To answer the research question, experiments were conducted on multiple pre-trained models: multilingual BERT, IndoBERT, stevenWH, and distilBERT to identify the best model for fine-tuning. This study also proposes two new fine-tuning methods: attention mask filter with IDF threshholding and frozen layer by freezing some layers within BERT. These models and methods were then tested under predefined scenarios. The experiment results show that the re-ranker can enhance the quality of the text-based retriever when fine-tuned with specific methods and scenarios. These models perform especially well using the health form dataset aswell as using the text based retrievers BM25 and TF-IDF. Out of all models, multilingulBERT performed the best with freezed layer fine-tuning performing as the best fine-tuning method. The most significant increase of all combinations is the combination of BM25 and multilingualBERT with freezed layer fine-tuning with a 0.051 increase compared to the baseline BM25."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library