Hasil Pencarian

Ditemukan 15 dokumen yang sesuai dengan query

Gusman Dharma Putra

Ekstraksi Informasi Bencana Alam di Indonesia dari Berita di Media Siber = Information Extraction on Natural Disaster Event in Indonesia from Cyber Media News

"Jenis bencana alam, lokasi, dan waktu kejadian adalah informasi minimal bisa mengindikasi terjadinya sebuah bencana alam. Salah satu sumber informasi kejadian bencana alam adalah dari berita di media siber. Suatu sistem informasi tentang bencana alam bisa memanfaatkan berita di media siber sebagai sumber data, namun harus mengubah data teks berita menjadi bentuk data terstruktur. Teknik penambangan teks yang bisa digunakan untuk mendapatkan data terstruktur dari suatu kumpulkan teks. Penelitian ini melakukan eksplorasi efektivitas teknik penambangan data untuk mengekstrak informasi jenis bencana alam, lokasi, dan waktu kejadian. Metode web scraping digunakan untuk mengumpulkan data teks berita dari media siber dan anotasi manual dilakukan untuk membuat data gold standard. Penelitian ini menggunakan klasifikasi teks dengan machine learning untuk mengetahui jenis bencana alam yang diberitakan. Klasifikasi biner diterapkan untuk mengetahui pemberitaan tentang bencana angin topan, banjir, erupsi, gempa, karhutla, kekeringan, longsor, dan tsunami. Algoritma yang diuji untuk klasifikasi teks adalah Multinomial Naive Bayes, Support Vector Machine, Random Forest, Linear Regression dan Adaboost. Penelitian ini memanfaatkan aplikasi Stanford NER untuk mengetahui entitas lokasi di suatu teks, kemudian gazetteer digunakan untuk pemetaan wilayah administrasi. Penelitian ini menggunakan pencocokan pola teks dengan regular expression untuk mengekstrak informasi tanggal kejadian bencana alam. Nilai F1 dari model klasifikasi penelitian ini untuk berita bencana angin topan, banjir, erupsi, gempa, karhutla, kekeringan, longsor, dan tsunami adalah 0,731, 0,767, 0,760, 0,761, 0,749, 0,680, 0,763, dan 0,600. Sedangkan Nilai F1 untuk hasil ekstraksi lokasi dan waktu adalah 0,795 dan 0,881.

The minimal information to notice the occurrence of a natural disaster is its type, location, and event time. News in the cyber media can be a source of information to discover disaster events. Furthermore, natural disaster information systems can utilize that news as the data source. The news needs to be converted into structured data to be processed by system information. Text mining is a method to extract structured information from a text collection. This research explored the effectiveness of data mining to extract natural disaster type, location, and event time reported by news in cyber media. The web scraping method was used to collect news in cyber media and manual annotation was performed to create gold-standard data. This study used text classification with a machine learning approach to identify the types of natural disasters reported. Binary classification was applied to label news for following disaster types: hurricanes, floods, eruptions, earthquakes, forest and land fires, droughts, landslides, and tsunami. This research evaluated Multinomial Naive Bayes, Support Vector Machines, Random Forests, Linear Regression, and AdaBoost algorithm for text classification tasks. This study utilized the Stanford NER application to recognize location entities in a text, then the gazetteer was used to get administrative area information. This study applied text patterns with regular expressions to extract date information of disaster events. The F1 value of 8 classification model in this research for following disaster news type: hurricanes, floods, eruptions, earthquakes, forest and land fires, droughts, landslides, and tsunami, are 0.731, 0.767, 0.760, 0.761, 0.749, 0.780, 0.680, 0.763, and 0.600. The F1 value of method to extract location and event time information are 0.795 and 0.881.

"

Depok: Fakultas Ilmu Komputer Universitas Indonesia , 2020

TA-Pdf

UI - Tugas Akhir Universitas Indonesia Library

Muhammad Okky Ibrohim

Klasifikasi multi label untuk identifikasi ujaran kebencian dan ujaran kasar pada Twitter berbahasa Indonesia = Multi-label classification to identify hate speech and abusive language on Indonesian Twitter

"ABSTRAK

Penyebaran ujaran kebencian dan ujaran kasar di media sosial merupakan hal yang harus diidentifikasi secara otomatis untuk mencegah terjadinya konflik masyarakat. Selain itu, ujaran kebencian mempunyai target, golongan, dan tingkat tersendiri yang juga perlu diidentifikasi untuk membantu pihak berwenang dalam memprioritaskan kasus ujaran kebencian yang harus segera ditangani. Tesis ini membahas klasifikasi teks multi label untuk mengidentifikasi ujaran kasar dan ujaran kebencian disertai identifikasi target, golongan, dan tingkatan ujaran kebencian pada Twitter berbahasa Indonesia. Permasalahan ini diselesaikan menggunakan pendekatan machine learning menggunakan algoritma klasifikasi Support Vector Machine (SVM), NaÃ¯ve Bayes (NB), dan Random Forest Decision Tree (RFDT) dengan metode transformasi data Binary Relevance (BR), Label Power-set (LP), dan Classifier Chains (CC). Jenis fitur yang digunakan antara lain fitur frekuensi term (word n-grams dan character n-grams), fitur ortografi (tanda seru, tanda tanya, huruf besar/kapital, dan huruf kecil), dan fitur leksikon (leksikon sentimen negatif, leksikon sentimen positif, dan leksikon kasar). Hasil eksperimen menunjukkan bahwa secara umum algoritma klasifikasi RFDT dengan metode transformasi LP memberikan akurasi yang terbaik dengan waktu komputasi yang cepat. Algoritma klasifikasi RFDT dengan metode transformasi LP menggunakan fitur word unigram memberikan akurasi sebesar 66,16%. Jika hanya mengidentifikasi ujaran kasar dan ujaran kebencian (tanpa disertai identifikasi target, golongan, dan tingkatan ujaran kebencian), algoritma klasifikasi RFDT dengan metode transformasi LP menggunakan gabungan fitur word unigram, character quadgrams, leksikon sentimen positif, dan leksikon kasar mampu memberikan akurasi sebesar 77,36%.

Hate speech and abusive language spreading on social media needs to be identified automatically to avoid conflict between citizen. Moreover, hate speech has target, criteria, and level that also needs to be identified to help the authority in prioritizing hate speech which must be addressed immediately. This thesis discusses multi-label text classification to identify abusive and hate speech including the target, category, and level of hate speech in Indonesian Twitter. This problem was done using machine learning approach with Support Vector Machine (SVM), NaÃ¯ve Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as data transformation method. The features that used are term frequency (word n-grams and character n-grams), ortography (exclamation mark, question mark, uppercase, lowercase), and lexicon features (negative sentiment lexicon, positif sentiment lexicon, and abusive lexicon). The experiment results show that in general RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time. RFDT classifier with LP transformation using word unigram feature give 66.16% of accuracy. If only for identifying abusive language and hate speech (without identifying the target, criteria, and level of hate speech), RFDT classifier with LP transformation using combined fitur word unigram, character quadgrams, positive sentiment lexicon, and abusive lexicon can gives 77,36% of accuracy.

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2019

T52442

UI - Tesis Membership Universitas Indonesia Library

Nofa Aulia

Deteksi ujaran kebencian teks panjang berbahasa Indonesia menggunakan data facebook = Hate speech detection on Indonesian long text using facebook data

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2019

T51811

UI - Tesis Membership Universitas Indonesia Library

Zihan Nindia

Analisa Long-Short Term Memory dan BERT Embeddings pada Klasifikasi Teks Data SMS Spam Berbahasa Indonesia = Analysis of Long-Short Term Memory and BERT Embeddings on Text Classification of SMS Spam Data in Indonesian

"Pesatnya perkembangan teknologi informasi dan komunikasi telah membawa banyak perubahan dalam kehidupan manusia. Salah satu perkembangan yang paling signifikan adalah munculnya teknologi pesan singkat atau Short Message Service (SMS). Media SMS sering disalahgunakan sebagai media penipuan terhadap pengguna telepon. Penipuan sering terjadi dengan cara mengirimkan SMS secara masif dan acak hingga mencapai sepuluh ribu per hari kepada semua pengguna dan menjadi SMS spam bagi banyak orang. Klasifikasi teks menggunakan Long-Short Term Memory (LSTM) dan BERT Embbeddings dilakukan untuk mengklasifikasi data SMS ke dalam dua kategori, yaitu spam dan non-spam. Data terdiri dari 5575 SMS yang telah diberi label. Dengan menggunakan metode LSTM + BERT, penelitian ini dapat mencapai nilai accuracy sebesar 97.85%. Metode ini menghasilkan hasil yang lebih baik dari ketiga model sebelumnya. Model LSTM + BERT menghasilkan nilai accuracy 0.65% lebih baik dari LSTM.

The rapid development of information and communication technology has brought many changes in human life. One of the most significant developments is the emergence of short message service (SMS) technology. SMS media is often misused as a medium for fraud against telephone users. Fraud often occurs by sending massive and random SMS up to ten thousand per day to all users and becomes SMS spam for many people. Text classification using Long-Short Term Memory (LSTM) and BERT Embeddings is performed to classify SMS data into two categories, namely spam and ham. The data consists of 5575 SMS that have been labeled. By using the LSTM + BERT method, this research can achieve an accuracy value of 97.85%. This method produces better results than the three previous models. The LSTM + BERT model produces an accuracy value of 0.65% better than LSTM."

Depok: Fakultas Teknik Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Ilham Aulia Malik

Klasifikasi data Twitter dalam bahasa Indonesia untuk tema konten pada aplikasi Fajr: Studi kasus XYZ = Classification of Twitter data in indonesian languange for content theme Fajr application: Case study XYZ / Ilham Aulia Malik

"[ABSTRAK

Aplikasi Fajr merupakan aplikasi mobile yang memiliki konten islami dengan

fitur utama yaitu Fajr Cards. Namun, Fajr Cards belum mampu menarik

perhatian pengguna dengan minimnya jumlah pengguna fitur ini. Fajr Cards

sebagai fitur yang berbasiskan kepada konten dapat ditingkatkan dengan

memberikan konten yang relevan dengan pengguna. Twitter sebagai media sosial

memiliki data real-time dan jumlah yang banyak sehingga dapat menjadi sumber

data aktual untuk dianalisa. Data Twitter dapat dianalisa dengan menggunakan

text mining. Salah satunya yaitu text classification atau klasifikasi teks Tujuan

penelitian ini adalah untuk menentukan metode klasifikasi apa yang terbaik untuk klasifikasi tema konten Fajr Cards. Metodologi yang digunakan menggunakan tahapan preprocess Text Mining dan

penggunaan metode Text Mining yaitu Text Classification. Hasil yang diharapkan adalah gambaran bagaimana data Twitter diproses untuk proses klasifikasi dan metode klasifikasi apa yang terbaik untuk klasifikasi tema konten Fajr Cards.

ABSTRACT

Fajr application is a mobile application that contains Islamic contents for moslem daily life. To get more users, the developers create a main feature called Fajr Cards. But, Fajr Cards has not been able to attract users. It is based on the minimum of users that using Fajr Cards. Fajr Cards as a feature based on contents can be improved by adding more content that have relevance value to users. Twitter as microblog social media have real time and a lot of data. Those data can be used as an actual source data for analyze. Text mining such as text classification will be used to analyze the data. The purpose of this research is to get what classification method that suited best for this classification. Methodology that used in this research is Text Mining including preprocess and Text Classification. The expected results is to know what classification method that suited best for Fajr Card?s theme classification.;Fajr application is a mobile application that contains Islamic contents for moslem

daily life. To get more users, the developers create a main feature called Fajr

Cards. But, Fajr Cards has not been able to attract users. It is based on the

minimum of users that using Fajr Cards. Fajr Cards as a feature based on contents

can be improved by adding more content that have relevance value to users.

Twitter as microblog social media have real time and a lot of data. Those data can

be used as an actual source data for analyze. Text mining such as text

classification will be used to analyze the data. The purpose of this research is to

get what classification method that suited best for this classification.

Methodology that used in this research is Text Mining including preprocess and

Text Classification. The expected results is to know what classification method that suited best for Fajr Card?s theme classification.;Fajr application is a mobile application that contains Islamic contents for moslem