Hasil Pencarian

Ditemukan 3 dokumen yang sesuai dengan query

Adib Muhammad Prawirahutama

Model Machine Learning untuk Klasifikasi Cepat Kualitas Air Menggunakan Feature Selection dan Parameter yang Dipilih Secara Fundamental = Machine Learning Model for Quick Water Quality Classification Using Feature Selection and Fundamentally Selected Parameters

"Air merupakan sumber daya yang paling penting bagi kehidupan, oleh karena itu perlu diperhatikan dan dijaga kualitasnya. Dalam studi air, ML menawarkan banyak peluang untuk mengklasifikasikan kualitas air. Hasil akurasi klasifikasi kualitas air bergantung pada model yang digunakan, ukuran kumpulan data, dan parameter air yang digunakan untuk melatih model pembelajaran. Dalam makalah ini, model SVM, NB, DT, RF, dan CATBoost digunakan untuk memodelkan proses klasifikasi kualitas air. Metode feature selection: filter, wrapped, dan embeded akan dibandingkan, bersama dengan model dengan pemilihan parameter manual yang dipilih berdasarkan kemudahan pengukurannya. Menggunakan embedded feature selection dan DT classifier dengan SMOTE sebagai metode penyeimbangan kelas, model ini dapat mencapai akurasi 99,33%, presisi 99,43%, daya ingat 99,33%, dan skor F1 99,34%. Model untuk indikasi kualitas air secara realtime juga diperoleh dengan classifier CatBoost, dengan akurasi 92,31%, presisi 91,72%, recall 92,31%, dan skor F1 91,75%.

Water is the most important resource for life, hence it’s quality needs to be checked and maintained. In water studies, ML offers numerous opportunities for classifying Water Quality (WQ) indicators. Results of WQ classification accuracy depend on the model used, the size of the data set, and the water parameters used to train the learning models. In this paper, SVM, NB, DT, RF, and CATBoost models are used to model a WQ classification. Filter, wrapped, and embedded feature selection methods will be compared, along with a model with a manual selection of parameters that are selected based on their ease of measurement. Using embedded feature selection and DT classifier with SMOTE as class balancing method, the model can achieve 99.33% accuracy, 99.43% precision, 99.33% recall, and 99.34% F1-score. Model for realtime water quality indication is also obtained with CatBoost classifier, it achieve 92.31% accuracy, 91.72% precision, 92.31% recall, and 91.75% F1-score."

Depok: Fakultas Teknik Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Haris Hamzah

Prediksi hubungan struktur molekul dan aktivitas biologi inhibitor dipeptidil peptidase-4 menggunakan metode deep neural network dengan metode pemilihan fitur catboost = Prediction of molecular structure and biological activity relationship of dipeptidyl peptidase-4 inhibitors using deep neural networks with catboost as feature selection method

"Diabetes mellitus tipe-2 (T2DM) merupakan penyakit metabolisme kronis yang sering diderita oleh orang dewasa. T2DM ditandai dengan menurunnya insulin dalam tubuh. Enzim dipeptidil peptidase-4 (DPP-4) dapat mengkatalisasi penurunan hormon peptida inkretin, terutama peptide-1 seperti hormon gastric inhibitory peptide (GIP) dan glucagon-like peptide-1 (GLP-1), yang mengakibatkan penurunan sintesis insulin. Inhibitor DPP-4 adalah target obat yang menjanjikan untuk T2DM, karena dapat memblokir kerja enzim DPP-4 dengan menghambat kerja hormon GLP-1 dan GIP. Penelitian ini menggunakan data inhibitor DPP-4 yang akan diekstraksi ciri menggunakan metode Extended-Connectivity Fingerprint (ECFP) dan Functional-Class Fingerprints (FCFP). Hasil ekstraksi ciri tersebut digunakan sebagai vektor masukan untuk metode deep neural network (DNN) untuk memprediksi inhibitor DPP-4 ke dalam senyawa aktif dan tidak aktif. Selain itu, metode CatBoost diusulkan sebagai metode pemilihan fitur terhadap hasil ekstraksi ciri metode ECFP dan FCFP. Dalam penelitian ini akan membandingkan performa metode DNN dengan menggunakan pemilihan fitur metode CatBoost dan tanpa menggunakan pemilihan fitur metode CatBoost. Hasil dari penelitian ini menunjukkan bahwa metode DNN menggunakan ekstraksi ciri ECFP_6 dengan proporsi pemilihan fitur sebesar 90% memiliki nilai sensitivitas, spesifisitas, akurasi, dan MCC berturut-turut adalah 0.927,0.881,0.906, dan 0.810.

Diabetes mellitus type-2 (T2DM) is a chronic metabolic disease that often affects adults. T2DM is characterized by a decrease of insulin in the body. The dipeptidyl peptidase-4 (DPP-4) enzyme can catalyze a decrease of incretin peptide hormones, especially peptide-1, such as gastric inhibitory peptide (GIP) hormone and glucagon-like peptide-1 (GLP-1), which results in decreased insulin synthesis. DPP-4 inhibitors are a promising drug target for T2DM because they block the action of the DPP-4 enzyme by inhibiting the activity of the GLP-1 and GIP hormones. This study uses DPP-4 inhibitor data, which will be feature extracted using the Extended-Connectivity Fingerprint (ECFP) and Functional-Class Fingerprints (FCFP) methods. The results of feature extraction are used as input vectors of the deep neural network (DNN) method to predict DPP-4 inhibitors into active and inactive compounds. In addition, the CatBoost method is proposed as a feature selection method for the feature extraction results of the ECFP and FCFP methods. In this study, we will compare the performance of the DNN method using the feature selection of the CatBoost method and without using the feature selection of the CatBoost method. The results of this study indicate that the DNN method using feature extraction ECFP_6 with 90% of the feature selection having sensitivity, specificity, accuracy, and MCC values, respectively, 0.927, 0.881, 0.906, and 0.810."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Frischi Dwi Nabilah

Implementasi Metode CatBoost untuk Klasifikasi Multikelas Credit Scoring pada Pinjaman Peer-to-peer = Implementation of the CatBoost Method for Multi-class Classification of Credit Scoring on Peer-to-peer Lending

"Credit scoring merupakan bentuk penilaian untuk menentukan kelayakan peminjam. Tidak ada kesepakatan kapan metode ini mulai berkembang. Namun, kesubjektivitasan dan ketidakmampuan manusia untuk memproses permohonan pinjaman dalam jumlah besar setiap harinya adalah alasan penggunaan credit scoring dengan machine learning menjadi sangat dibutuhkan. Untuk mendeteksi dini potensi peminjam yang bermasalah, credit scoring pada tugas akhir ini diprediksi status pinjaman menjadi tiga kelas: default, fully paid, dan late. Berdasarkan permasalahan tersebut, pada tugas akhir ini digunakan model untuk memprediksi status pinjaman pada kasus klasifikasi multikelas credit scoring dengan machine learning menggunakan metode CatBoost. Penggunaan metode CatBoost dimaksudkan untuk mengatasi kasus klasifikasi multikelas pada data yang heterogen dan tidak seimbang (imbalanced data). Data yang digunakan adalah data pinjaman online peer-to-peer (P2P) LendingClub yang memuat tiga jenis informasi yaitu informasi pinjaman, informasi peminjam, dan informasi riwayat pinjaman peminjam. Data pinjaman P2P LendingClub memiliki imbalanced data dan overlapping class. Terdapat tiga skenario sampling strategy SMOTE-NC dilakukan untuk melihat efek imbalanced data dan overlapping class pada permasalahan klasifikasi multikelas tersebut sehingga didapatkan tiga model. Kinerja model CatBoost dievaluasi berdasarkan precision, recall, f1-score serta accuracy dan AUC one-vs-all. Hasil implementasi CatBoost sudah baik pada kelas 1 (fully paid) dikarenakan f1-score ketiga skenario lebih dari 0,75. Namun, pada kelas 0 (default) dan kelas 2 (late) hasil implementasinya masih tidak baik mengingat f1-score pada kelas 0 (default) tertinggi hanyalah 0,15 sementara f1-score kelas 2 (late) bernilai sama yaitu 0,04 pada ketiga skenario model yang dibuat. Efek dari imbalanced data dan overlapping class pada metrik evaluasi model precision, recall, f1-score serta accuracy dan AUC one-vs-all beragam bergantung dengan kelasnya.

Credit scoring is a form of assessment used to determine the creditworthiness of borrowers. There is no agreement on when this method started to develop. However, subjectivity and the inability of humans to process large volumes of loan applications every day are the reasons why credit scoring with machine learning is highly needed. In order to detect potential problem borrowers early on, this final project predicts the loan status into three classes: default, fully paid, and late. Based on this problem, a model is employed in this final project to predict the loan status in a multi-class classification of credit scoring by using machine learning, specifically using the CatBoost method. The use of CatBoost is intended to address multi-class classification cases with heterogeneous and imbalanced data. The data used in this research is online peer-to-peer (P2P) lending data from LendingClub, which includes three types of information: loan information, borrower information, and borrower's loan history information. The P2P LendingClub loan data has imbalanced data and overlapping classes. Three sampling strategy scenarios of SMOTE-NC are performed to observe the effects of imbalanced data and overlapping classes on this multi-class classification problem, resulting in having three models. The performance of the CatBoost model is evaluated based on precision, recall, f1-score, as well as accuracy and AUC one-vs-all. The implementation of CatBoost yields good results for class 1 (fully paid) as the f1-scores in all three scenarios are above 0.75. However, the implementation results for class 0 (default) and class 2 (late) are still unsatisfactory, considering that the highest f1-score for class 0 (default) is only 0.15, while the f1-score for class 2 (late) has the same value, i.e., 0.04, in all three model scenarios. The effects of imbalanced data and overlapping classes on the evaluation metrics of precision, recall, f1-score, as well as accuracy and AUC one-vs-all vary depending on the class."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian