Hate speech and abusive language spreading on social media needs to be identified automatically to avoid conflict between citizen. Moreover, hate speech has target, criteria, and level that also needs to be identified to help the authority in prioritizing hate speech which must be addressed immediately. This thesis discusses multi-label text classification to identify abusive and hate speech including the target, category, and level of hate speech in Indonesian Twitter. This problem was done using machine learning approach with Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as data transformation method. The features that used are term frequency (word n-grams and character n-grams), ortography (exclamation mark, question mark, uppercase, lowercase), and lexicon features (negative sentiment lexicon, positif sentiment lexicon, and abusive lexicon). The experiment results show that in general RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time. RFDT classifier with LP transformation using word unigram feature give 66.16% of accuracy. If only for identifying abusive language and hate speech (without identifying the target, criteria, and level of hate speech), RFDT classifier with LP transformation using combined fitur word unigram, character quadgrams, positive sentiment lexicon, and abusive lexicon can gives 77,36% of accuracy.
"