Hate speech and abusive language spreading on social media needs to be identified automatically to avoid conflict between citizen. Moreover, hate speech has target, criteria, and level that also needs to be identified to help the authority in prioritizing hate speech which must be addressed immediately. This thesis discusses multi-label text classification to identify abusive and hate speech including the target, category, and level of hate speech in Indonesian Twitter. This problem was done using machine learning approach with Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as data transformation method. The features that used are term frequency (word n-grams and character n-grams), ortography (exclamation mark, question mark, uppercase, lowercase), and lexicon features (negative sentiment lexicon, positif sentiment lexicon, and abusive lexicon). The experiment results show that in general RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time. RFDT classifier with LP transformation using word unigram feature give 66.16% of accuracy. If only for identifying abusive language and hate speech (without identifying the target, criteria, and level of hate speech), RFDT classifier with LP transformation using combined fitur word unigram, character quadgrams, positive sentiment lexicon, and abusive lexicon can gives 77,36% of accuracy.
"Data LiDAR banyak menggantikan data dua dimensi untuk merepresentasikan data geografis karena kekayaan informasi yang dimilikinya. Salah satu jenis pemrosesan data LiDAR adalah segmentasi semantik tutupan lahan yang mana telah banyak dikembangkan menggunakan pendekatan model deep learning. Algoritma-algoritma tersebut menggunakan representasi jarak Euclidean untuk menyatakan jarak antar poin atau node. Namun, sifat acak dari data LiDAR kurang sesuai jika representasi jarak Euclidean tersebut diterapkan. Untuk mengatasi ketidaksesuaian tersebut, penelitian ini menerapkan representasi jarak non-Euclidean yang secara adaptif diupdate menggunakan nilai kovarian dari set data point cloud. Ide penelitian ini diaplikasikan pada algoritma Dynamic Graph Convolutional Neural Network (DGCNN). Dataset yang digunakan dalam penelitian ini adalah data LiDAR Kupang. Metode pada penelitian ini menghasilkan performa nilai akurasi 75,55%, di mana nilai akurasi ini lebih baik dari algoritma dasar PointNet dengan 65,08% dan DGCNN asli 72,56%. Peningkatan performa yang disebabkan oleh faktor perkalian dengan invers kovarian dari data point cloud dapat meningkatkan kemiripan suatu poin terhadap kelasnya.
LiDAR data widely replaces two-dimensional geographic data representation due to its information resources. One of LiDAR data processing tasks is land cover semantic segmentation which has been developed by deep learning model approaches. These algorithms utilize Euclidean distance representation to express the distance between the points. However, LiDAR data with random properties are not suitable to use this distance representation. To overcome this discprepancy, this study implements a non-Euclidean distance representation which is adaptively updated by applying their covariance values. This research methodology was then implemented in Dynamic Graph Convolutional Neural Network (DGCNN) algorithm. The dataset in this research is Kupang LiDAR. The results obtained performance accuracy value of 75.55%, which is better than the baseline PointNet of 65.08% and Dynamic Graph CNN of 72.56%. This performance improvement is caused by a multiplication of the inverse covariance value of point cloud data, which raised the points similarity to the class.
"