Credit scoring merupakan bentuk penilaian untuk menentukan kelayakan peminjam. Tidak ada kesepakatan kapan metode ini mulai berkembang. Namun, kesubjektivitasan dan ketidakmampuan manusia untuk memproses permohonan pinjaman dalam jumlah besar setiap harinya adalah alasan penggunaan credit scoring dengan machine learning menjadi sangat dibutuhkan. Untuk mendeteksi dini potensi peminjam yang bermasalah, credit scoring pada tugas akhir ini diprediksi status pinjaman menjadi tiga kelas: default, fully paid, dan late. Berdasarkan permasalahan tersebut, pada tugas akhir ini digunakan model untuk memprediksi status pinjaman pada kasus klasifikasi multikelas credit scoring dengan machine learning menggunakan metode CatBoost. Penggunaan metode CatBoost dimaksudkan untuk mengatasi kasus klasifikasi multikelas pada data yang heterogen dan tidak seimbang (imbalanced data). Data yang digunakan adalah data pinjaman online peer-to-peer (P2P) LendingClub yang memuat tiga jenis informasi yaitu informasi pinjaman, informasi peminjam, dan informasi riwayat pinjaman peminjam. Data pinjaman P2P LendingClub memiliki imbalanced data dan overlapping class. Terdapat tiga skenario sampling strategy SMOTE-NC dilakukan untuk melihat efek imbalanced data dan overlapping class pada permasalahan klasifikasi multikelas tersebut sehingga didapatkan tiga model. Kinerja model CatBoost dievaluasi berdasarkan precision, recall, f1-score serta accuracy dan AUC one-vs-all. Hasil implementasi CatBoost sudah baik pada kelas 1 (fully paid) dikarenakan f1-score ketiga skenario lebih dari 0,75. Namun, pada kelas 0 (default) dan kelas 2 (late) hasil implementasinya masih tidak baik mengingat f1-score pada kelas 0 (default) tertinggi hanyalah 0,15 sementara f1-score kelas 2 (late) bernilai sama yaitu 0,04 pada ketiga skenario model yang dibuat. Efek dari imbalanced data dan overlapping class pada metrik evaluasi model precision, recall, f1-score serta accuracy dan AUC one-vs-all beragam bergantung dengan kelasnya. Credit scoring is a form of assessment used to determine the creditworthiness of borrowers. There is no agreement on when this method started to develop. However, subjectivity and the inability of humans to process large volumes of loan applications every day are the reasons why credit scoring with machine learning is highly needed. In order to detect potential problem borrowers early on, this final project predicts the loan status into three classes: default, fully paid, and late. Based on this problem, a model is employed in this final project to predict the loan status in a multi-class classification of credit scoring by using machine learning, specifically using the CatBoost method. The use of CatBoost is intended to address multi-class classification cases with heterogeneous and imbalanced data. The data used in this research is online peer-to-peer (P2P) lending data from LendingClub, which includes three types of information: loan information, borrower information, and borrower's loan history information. The P2P LendingClub loan data has imbalanced data and overlapping classes. Three sampling strategy scenarios of SMOTE-NC are performed to observe the effects of imbalanced data and overlapping classes on this multi-class classification problem, resulting in having three models. The performance of the CatBoost model is evaluated based on precision, recall, f1-score, as well as accuracy and AUC one-vs-all. The implementation of CatBoost yields good results for class 1 (fully paid) as the f1-scores in all three scenarios are above 0.75. However, the implementation results for class 0 (default) and class 2 (late) are still unsatisfactory, considering that the highest f1-score for class 0 (default) is only 0.15, while the f1-score for class 2 (late) has the same value, i.e., 0.04, in all three model scenarios. The effects of imbalanced data and overlapping classes on the evaluation metrics of precision, recall, f1-score, as well as accuracy and AUC one-vs-all vary depending on the class. |