Search Result

Found 6 Document(s) match with the query

Fatma Irmadani

Klasifikasi Multikelas Credit Scoring pada Pinjaman Peer-to-Peer Menggunakan Metode Multinomial Logistic Regression = Multiclass Classification of Credit Scoring on Peer-to-Peer Loans Using the Multinomial Logistic Regression

Credit Scoring adalah metode yang digunakan untuk memprediksi kemungkinan adanya risiko calon peminjam akan gagal bayar atau menunggak. Credit scoring digunakan oleh penyedia jasa pinjaman ketika calon peminjam dana mengajukan pinjaman. Salah satu perusahaan yang menggunakan credit scoring terhadap peminjamnya adalah Lending Club. Lending Club adalah salah satu penyedia jasa pinjam meminjam online Peer-to-Peer (P2P) di Amerika Serikat. Pada penelitian ini, dilakukan klasifikasi multikelas credit scoring berdasarkan status pinjaman (Loan Status) dari dataset Lending Club. Status pinjaman memiliki 3 kelas, yaitu default, fully paid, dan late. Dengan menggunakan pendekatan machine learning, yaitu supervised learning, klasifikasi multikelas credit scoring dapat dilakukan dengan menggunakan Multinomial Logistic Regression (MLR). MLR merupakan pengembangan dari Logistic Regression yang mampu menangani klasifikasi multikelas. Pada implementasi model MLR, digunakan 3 skenario sampling strategy pada SMOTE yang berbeda dalam mengklasifikasikan multikelas. Hasil klasifikasi multikelas dievaluasi dengan menggunakan metrik accuracy, precision, recall, F1-Score dan AUC (Area Under the Curve) One versus All. Hasil implementasi dengan evaluasi terbaik adalah model MLR dengan nilai accuracy sebesar 0,67 dan nilai rata-rata AUC One versus All sebesar 0,724932. Sedangkan evaluasi pada setiap kelas, kelas default memiliki nilai precision sebesar 0,47,recall sebesar 0,02 dan F1-Score sebesar 0,04; kelas fully paid memiliki nilai precision sebesar 0,85, recall sebesar 0,83 dan F1-Score sebesar 0,84; dan kelas late memiliki nilai precision sebesar 0,02, recall sebesar 0,84 dan F1-Score sebesar 0,04. Hasil tersebut menunjukkan bahwa kelas default memiliki hasil evaluasi yang kurang baik untuk setiap metrik evaluasi, kelas fully paid memiliki hasil evaluasi yang baik untuk setiap metrik evaluasi, sedangkan kelas late memiliki nilai yang cukup baik hanya pada nilai recall (0,84). Hasil yang kurang baik diduga dipengaruhi oleh adanya data yang tidak seimbang dan kelas yang saling tumpang tindih.

Credit Scoring is a method used to predict the possible risk that a prospective borrower will default or delinquency. Credit scoring is used by loan service providers when prospective borrowers apply for loans. One company that uses credit scoring for its borrowers is the Lending Club. Lending Club is a Peer-to-Peer (P2P) online lending and borrowing service provider in the United States. In this study, a multiclass credit scoring classification was carried out based on loan status from the Lending Club dataset. Loan status has 3 classes, namely default, fully paid, and late. By using a machine learning approach, namely supervised learning, multiclass classification of credit scoring can be done using Multinomial Logistic Regression (MLR). MLR is a development of Logistic Regression which is able to handle multiclass classification. In the implementation of the MLR model, 3 different sampling strategy scenarios are used in SMOTE in classifying multiclasses. The multiclass classification results are evaluated using accuracy, precision, recall, F1-Score and AUC (Area Under the Curve) One versus All metrics. The result of the implementation with the best evaluation is the MLR model with an accuracy value of 0.67 and an average value of AUC One versus All of 0.724932. While the evaluation for each class, the default class has a precision value of 0.47, a recall of 0.02 and an F1-Score of 0.04; the fully paid class has a precision value of 0.85, a recall of 0.83 and an F1-Score of 0.84; and the late class has a precision value of 0.02, a recall of 0.84 and an F1-Score of 0.04. These results show that the default class has poor evaluation results for each evaluation metric, the fully paid class has good evaluation results for each evaluation metric, while the late class has a fairly good value only on the recall value (0.84). Unfavorable results are thought to be influenced by the presence of unbalanced data and overlapping classes.
"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Widya Fajar Mustika

Analisis akurasi model XGBoost untuk klasifikasi multikelas: studi kasus prediksi tingkat klaim risiko pemohon pada asuransi jiwa = Analyzing accuracy of XGBoost model for multiclass classification: a case study of the applicant level claim risk prediction for life insurance

"Penilaian tingkat klaim risiko pemohon asuransi merupakan bagian penting dalam asuransi jiwa, sehingga perlu untuk diklasifikasikan. Penentuan tingkat klaim risiko pada asuransi jiwa didasarkan pada data historis pemohon. Pengajuan untuk menjadi anggota suatu asuransi jiwa membutuhkan waktu yang tidak singkat. Namun pengaplikasian suatu model machine learning dapat membantu mengklasifikasikan calon pemohon asuransi berdasarkan tingkat risiko dengan cepat. Salah satu model machine learning yaitu Extreme Gradient Boosting (XGBoost) yang merupakan suatu model berbasis decision tree. Model ini digunakan untuk memprediksi risiko pada asuransi jiwa. Adanya missing values pada data yang digunakan diatasi dengan beberapa strategi pada proses prapengolahan data untuk meningkatkan nilai akurasi model XGBoost. Hasil penelitian ini diperoleh bahwa akurasi model XGBoost sebesar 0,60730 dengan satuan kappa yang menunjukkan bahwa model XGBoost sangat baik dan dapat diterapkan pada masalah prediksi tingkat klaim risiko pemohon asuransi jiwa. Jika dibandingkan dengan model decision tree, random forest dan Bayesian ridge, kinerja model XGoost masih tetap unggul dalam memproses missing values pada data yang digunakan.

Risk level assessment for insurance applicants is an important part of life insurance, so it needs to be classified. Determination of the level of risk claims on life insurance is based on the applicants historical data. Submission to become a member of a life insurance requires a short time. But the application of a machine learning model can help classify prospective insurance applicants based on the level of risk quickly. One machine learning model is Extreme Gradient Boosting (XGBoost) which is a decision tree based model. This model is used to predict risk in life insurance. The missing values in the data used are overcome by several strategies in the data processing process to increase the accuracy value of the XGBoost model. The results of this study show that the accuracy of the XGBoost model is 0.60730 with kappa units which indicates that the XGBoost model is very good and can be applied to the problem of predicting the level of risk claims for life insurance applicants. When compared to the decision tree, random forest and Bayesian ridge models, the performance of the XGoost model still excels in processing missing values in the data used."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2019

T54273

UI - Tesis Membership Universitas Indonesia Library

Frischi Dwi Nabilah

Implementasi Metode CatBoost untuk Klasifikasi Multikelas Credit Scoring pada Pinjaman Peer-to-peer = Implementation of the CatBoost Method for Multi-class Classification of Credit Scoring on Peer-to-peer Lending

"Credit scoring merupakan bentuk penilaian untuk menentukan kelayakan peminjam. Tidak ada kesepakatan kapan metode ini mulai berkembang. Namun, kesubjektivitasan dan ketidakmampuan manusia untuk memproses permohonan pinjaman dalam jumlah besar setiap harinya adalah alasan penggunaan credit scoring dengan machine learning menjadi sangat dibutuhkan. Untuk mendeteksi dini potensi peminjam yang bermasalah, credit scoring pada tugas akhir ini diprediksi status pinjaman menjadi tiga kelas: default, fully paid, dan late. Berdasarkan permasalahan tersebut, pada tugas akhir ini digunakan model untuk memprediksi status pinjaman pada kasus klasifikasi multikelas credit scoring dengan machine learning menggunakan metode CatBoost. Penggunaan metode CatBoost dimaksudkan untuk mengatasi kasus klasifikasi multikelas pada data yang heterogen dan tidak seimbang (imbalanced data). Data yang digunakan adalah data pinjaman online peer-to-peer (P2P) LendingClub yang memuat tiga jenis informasi yaitu informasi pinjaman, informasi peminjam, dan informasi riwayat pinjaman peminjam. Data pinjaman P2P LendingClub memiliki imbalanced data dan overlapping class. Terdapat tiga skenario sampling strategy SMOTE-NC dilakukan untuk melihat efek imbalanced data dan overlapping class pada permasalahan klasifikasi multikelas tersebut sehingga didapatkan tiga model. Kinerja model CatBoost dievaluasi berdasarkan precision, recall, f1-score serta accuracy dan AUC one-vs-all. Hasil implementasi CatBoost sudah baik pada kelas 1 (fully paid) dikarenakan f1-score ketiga skenario lebih dari 0,75. Namun, pada kelas 0 (default) dan kelas 2 (late) hasil implementasinya masih tidak baik mengingat f1-score pada kelas 0 (default) tertinggi hanyalah 0,15 sementara f1-score kelas 2 (late) bernilai sama yaitu 0,04 pada ketiga skenario model yang dibuat. Efek dari imbalanced data dan overlapping class pada metrik evaluasi model precision, recall, f1-score serta accuracy dan AUC one-vs-all beragam bergantung dengan kelasnya.

Credit scoring is a form of assessment used to determine the creditworthiness of borrowers. There is no agreement on when this method started to develop. However, subjectivity and the inability of humans to process large volumes of loan applications every day are the reasons why credit scoring with machine learning is highly needed. In order to detect potential problem borrowers early on, this final project predicts the loan status into three classes: default, fully paid, and late. Based on this problem, a model is employed in this final project to predict the loan status in a multi-class classification of credit scoring by using machine learning, specifically using the CatBoost method. The use of CatBoost is intended to address multi-class classification cases with heterogeneous and imbalanced data. The data used in this research is online peer-to-peer (P2P) lending data from LendingClub, which includes three types of information: loan information, borrower information, and borrower's loan history information. The P2P LendingClub loan data has imbalanced data and overlapping classes. Three sampling strategy scenarios of SMOTE-NC are performed to observe the effects of imbalanced data and overlapping classes on this multi-class classification problem, resulting in having three models. The performance of the CatBoost model is evaluated based on precision, recall, f1-score, as well as accuracy and AUC one-vs-all. The implementation of CatBoost yields good results for class 1 (fully paid) as the f1-scores in all three scenarios are above 0.75. However, the implementation results for class 0 (default) and class 2 (late) are still unsatisfactory, considering that the highest f1-score for class 0 (default) is only 0.15, while the f1-score for class 2 (late) has the same value, i.e., 0.04, in all three model scenarios. The effects of imbalanced data and overlapping classes on the evaluation metrics of precision, recall, f1-score, as well as accuracy and AUC one-vs-all vary depending on the class."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Yoel Zabarro

Analisis Kinerja Metode Random Forest untuk Klasifikasi Multikelas Credit Scoring = Performance Analysis of the Random Forest Method for Credit Scoring Multiclass Classification

"Credit scoring adalah suatu proses dalam mengevaluasi kelayakan kredit dari suatu individu. Credit Scoring perlu dilakukan perusahaan keuangan untuk meminimalisir risiko kredit, karena credit scoring dapat menentukan kelayakan debitur. Salah satu perusahaan keuangan yang menyediakan jasa pinjaman berbasis P2P (Peer-to-Peer) yang menerapkan credit scoring dalam evaluasi debitur adalah LendingClub. Pada skripsi ini dilakukan klasifikasi multikelas credit scoring berdasarkan status pinjaman (loan status) yang terdiri dari 3 kelas, yaitu default, fully paid, dan late. Klasifikasi multikelas credit scoring dapat dilakukan dengan salah satu pendekatan machine learning, yaitu supervised learning. Metode supervised learning yang digunakan yaitu random forest. Random forest adalah suatu metode pencarian informasi berbasis tree dengan setiap tree memuat kumpulan variabel acak. Implementasi model random forest dilakukan dengan menggunakan tiga skenario strategy sampling SMOTE yang berbeda. Implementasi model pada tiap skenario dilakuan sebanyak 5 kali percobaan dan dievaluasi menggunakan precision, recall, f1-score, accuracy, dan AUC one vs all. Rata-rata accuracy terbaik adalah sebesar 0,78; dan rata-rata AUC one vs all terbaik adalah sebesar 0,679179. Sedangkan untuk hasil evaluasi berdasarkan tiap kelas, pada kelas default, precision terbaik adalah sebesar 0,39; recall terbaik adalah sebesar 0,27; dan f1-score terbaik adalah sebesar 0,28. Pada kelas fully paid, precision terbaik adalah sebesar 0,82; recall terbaik adalah sebesar 0,95; dan f1-score terbaik adalah sebesar 0,88. Pada kelas late, precision terbaik adalah sebesar 0,02; recall terbaik adalah sebesar 0,02; dan f1-score terbaik adalah sebesar 0,02. Secara keseluruhan, hasil evaluasi model pada ketiga skenario hanya baik dalam memprediksi kelas 1 (fully paid), tetapi kurang baik dalam memprediksi kelas 0 (default) dan kelas 2 (late). Hal tersebut diduga terjadi akibat dataset yang terdapat imbalance data dan class overlap.

Credit scoring is a process in evaluating the creditworthiness of an individual. Credit scoring needs to be done by financial companies to minimize credit risk, because credit scoring can determine the eligibility of debtors. One financial company that provides P2P (Peer-to-Peer) based loan services that applies credit scoring in debtor evaluation is LendingClub. In this thesis, a multiclass classification of credit scoring based on loan status was carried out consisting of 3 classes, namely default, fully paid, and late. Multiclass classification of credit scoring can be done with one of the machine learning approaches, namely supervised learning. The supervised learning method used is random forest. Random forest is a tree-based method of retrieving information with each tree containing a random set of variables. The implementation of the random forest model was carried out using three different SMOTE strategy sampling scenarios. Model implementation in each scenario was carried out 5 times and evaluated using precision, recall, f1-score, accuracy, and AUC one vs all. The best average accuracy is 0.78; and the best average AUC of one vs all is 0.679179. As for the evaluation results based on each class, in the default class, the best precision is 0.39; The best recall was 0.27; and the best F1-score is 0.28. In the fully paid class, the best precision is 0.82; The best recall is 0.95; and the best F1-score is 0.88. In the late class, the best precision is 0.02; The best recall is 0.02; and the best F1-score is 0.02. Overall, the results of model evaluation in all three scenarios were only good at predicting class 1 (fully paid), but less good at predicting class 0 (default) and class 2 (late). This is thought to occur due to datasets that contain data imbalances and class overlap"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Bima Tri Ariyanto

Penerapan Data Mining Untuk Klasifikasi Intrusi Melalui Jaringan Internet: Studi Kasus Badan Meteorologi Klimatologi Dan Geofisika = Application of Data Mining for Intrusion Classification through the Internet Network: Case Study of Meteorology Climatology and Geophysics Agency

"Aktivitas anomali pada jaringan internet BMKG belum seluruhnya dapat dianalisis secara manual, sehingga beberapa sistem BMKG terdampak oleh aktivitas siber ini. Deteksi dan klasifikasi intrusi merupakan upaya penting yang dapat dilakukan BMKG dalam menangani serangan siber. Penelitian ini bertujuan untuk membuat model klasifikasi terbaik untuk mengklasifikasikan intrusi. Dataset yang digunakan adalah dataset CICIDS2017 dan data internet BMKG yang kemudian dilakukan penanganan data tidak seimbang menggunakan SMOTE. Untuk meningkatkan performa klasifikasi, dilakukan seleksi fitur dan diusulkan tiga variasi jumlah fitur, yaitu 7 fitur, 18 fitur, dan 82 atau keseluruhan fitur. Klasifikasi yang dilakukan mencakup klasifikasi biner untuk membedakan serangan dan normal, serta multikelas untuk mengklasifikasikan beberapa jenis serangan. Algoritma klasifikasi yang digunakan dalam penelitian ini adalah KNearest Neighbor (KNN), Decision Tree (DT), dan Random Forest (RF). Hasil model klasifikasi terbaik untuk kelas biner adalah DT dengan 82 atau keseluruhan fitur dengan akurasi 99,1%. Sedangkan model terbaik untuk multikelas adalah DT dengan 82 atau keseluruhan fitur dengan akurasi 99,2%. Penelitian ini menunjukkan bahwa model klasifikasi berbasis pembelajaran mesin dapat meningkatkan deteksi dan klasifikasi serangan siber dengan akurasi tinggi. BMKG dapat mengimplementasikan model ini untuk deteksi otomatis dan respons cepat terhadap ancaman, melakukan uji coba lapangan, memberikan pelatihan staf, dan memastikan pemeliharaan serta pemantauan rutin model. Langkah-langkah ini dapat membantu BMKG dalam meningkatkan keamanan jaringan dan melindungi data serta layanan dari serangan siber di masa mendatang.

Anomalous activity on the BMKG's internet network cannot be fully analyzed manually, so several BMKG systems have been affected by this cyber activity. Intrusion detection and classification is an important effort that can be made by BMKG in dealing with cyber attacks. This research aims to create the best classification model to classify intrusions. The datasets used are the CICIDS2017 dataset and BMKG internet data, which are then handled with unbalanced data using SMOTE. To improve classification performance, feature selection is performed, and three variations in the number of features are proposed, namely 7 features, 18 features, and 82 or all features. The classification includes binary classification to distinguish between normal and attack and multiclass classification to classify multiple types of attacks. The classification algorithms used in this research are K-Nearest Neighbor (KNN), Decision Tree (DT), and Random Forest (RF). The best classification model for binary classes is DT with 82 or all features with 99.1% accuracy. While the best model for multiclass is DT with 82 or all features with 99.2% accuracy. This research shows that a machine learning-based classification model can improve cyberattack detection and classification with high accuracy. BMKG can implement this model for automated detection and rapid response to threats, conduct field trials, provide staff training, and ensure regular model maintenance and monitoring. These steps can help BMKG improve network security and protect data and services from future cyberattacks."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2024

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Vinezha Panca

Klasifikasi multikelas kanker otak dengan metode multiple support vector machine recursive feature elimination dan twin support vector machine = Multiclass brain cancer classification using multiple support vector machine recursive feature elimination and twin support vector machine

"ABSTRAK

Kanker merupakan salah satu penyebab kematian terbesar di seluruh dunia. Secara khusus, kanker otak adalah kanker yang terjadi pada sistem saraf pusat. Salah satu hal yang dapat dilakukan untuk penelitian kanker otak menggunakan machine learning adalah melakukan pendeteksian jenis kanker otak dengan memanfaatkan microarray data. Permasalahan tersebut merupakan masalah klasifikasi multikelas. Dengan menggunakan pendekatan one versus one, akan terbentuk sebanyak k k-1 /2 masalah dua kelas, di mana k menunjukkan jumlah kelas. Karena data kanker otak memiliki fitur yang sangat banyak, perlu dilakukan seleksi fitur. Pada penelitian ini, akan diimplementasikan metode Multiple Multiclass Support Vector Machine Recursive Feature Elimination MMSVM-RFE sebagai metode seleksi fitur, dan Twin Support Vector Machine TWSVM sebagai metode klasifikasi. Pada metode MMSVM-RFE dilakukan pelatihan SVM-RFE pada setiap masalah dua kelas, sehingga setiap masalah dua kelas memiliki pengurutan fitur masing-masing. Sebagai metode klasifikasi, TWSVM memiliki tujuan untuk mencari hyperplane masing ndash; masing kelas sedemikian sehingga data kelas satu sedekat mungkin terhadap suatu hyperplane namun sejauh mungkin dengan hyperplane lainnya. Rata-rata akurasi tertinggi pada simulasi menggunakan kernel linear pada MMSVM-RFE dan kernel linear pada TWSVM adalah 95,33 dengan menggunakan 200 fitur. Rata-rata akurasi tertinggi pada simulasi menggunakan kernel linear pada MMSVM-RFE dan kernel RBF pada TWSVM adalah 87 dengan 70 fitur. Sedangkan apabila proses validasi juga dilakukan pada seleksi fitur, rata-rata akurasi tertinggi yang diperoleh adalah 90,67 dengan menggunakan 90 fitur.

ABSTRACT

Cancer is one of main causes of death worldwide. Brain cancer is a type of cancer which occurs at central nervous system. Taking advantage from microarray data, machine learning methods can be applied to help brain cancer prediction according to its types. This problem can be referred as a multiclass classification problem. Using one versus one approach, the multiclass problem with k classes can be transformed into k k 1 2 binary class problems. The huge amount of features makes it necessary to use feature selection. In this research, Multiple Multiclass Support Vector Machine Recursive Feature Elimination MMSVM RFE method is implemented as the feature selection method, and Twin Support Vector Machine TWSVM method is implemented as the classification method. The main concept of MMSVM RFE is to train SVM RFE at each binary problem so that each binary problem will have their own arrangements of feature. As a classification method, TWSVM is trained to find two hyperplanes, each representative of its own class. The data of one class must be as near as possible from its representative hyperplane while also must be as far as possible from the other hyperplane. In the simulation which uses linear kernel on MMSVM RFE and linear kernel on TWSVM, the highest average accuracy is 95,33 , using 200 features. In the simulation which uses linear kernel on MMSVM RFE and RBF kernel on TWSVM, the highest average accuracy is 87 , using 70 features. In the case where the feature selection process is included in doing validation, the highest average accuracy is 90,67 , using 90 features."

2016

S66302

UI - Skripsi Membership Universitas Indonesia Library

Search Result :: Save as CSV :: Back

Search Result