Hasil Pencarian

Ditemukan 31619 dokumen yang sesuai dengan query

Vabiyana Safira Desdhanty

Klasifikasi Data Kanker Hati Menggunakan Metode Improved Random Forest-based Rule Extraction = Liver Cancer Classification Using Improved Random Forest-Based Rule Extraction

"Kanker adalah salah satu penyebab kematian utama di dunia,dengan jumlah kematian sekitar sepuluh juta kematian setiap tahun. Kanker hati menempati peringkat keenam untuk jenis kanker yang umum terjadi pada pria dan wanita. Menurut penelitian, pendeteksian dini penting untuk mencegah penyebaran kanker ke organ lain. Hal ini menyebabkan penggunaan machine learning di bidang medis untuk mengklasifikasikan data kanker agar manghasilkan diagnosis yang tepat. Namun ada kalanya dibutuhkan lebih dari satu algoritma untuk meningkatkan akurasi. Maka dari itu, penelitian ini bertujuan untuk menganalisis pengaruh Genetic Algorithm sebagai penyetelan hyperparameter untuk nilai akurasinya, Penggunaan Random Forest dengan Genetic Algorithm sebagai penyetel hyperparameter memberikan akurasi sebesar 85% dengan data testing 90%. Sementara untuk Random Forest saja, hasil akurasi tertinggi adalah 73% dengan data testing sebesar 40%.

Cancer is one of the leading causes of mortality worldwide, with approximately ten million deaths each year. Liver cancer is the sixth most common type that occurs in both men and women. According to scientific studies, early detection is important to prevent the spread of this ailment to other organs. This led to Machine Learning in medical fields for classifying cancer data to produce an accurate diagnosis. However, there are times where just one machine learning algorithm is not giving a good accuracy score. Therefore, this study aims to analyze the effect of using Genetic Algorithm as hyperparameter tuning in terms of the accuracy level. The usage of Random Forest with Genetic Algorithm as the hyperparameter tuning algorithm gives the accuracy of 85% with 90% data testing. Meanwhile, with Random Forest alone, the highest accuracy score is 73% with 40% testing data."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Fildzah Zhafarina

Klasifikasi data kanker hati menggunakan metode Twin Support Vector Machines = Liver cancer classification using Twin Support Vector Machines methods

Kanker hati merupakan penyebab utama kematian akibat kanker di seluruh dunia. Di Indonesia, kanker hati menempati angka kejadian tertinggi kedua untuk laki laki yaitu sebesar 12,4 per 100.000 penduduk dengan rata-rata kematian 7,6 per 100.000 penduduk. Pada tugas akhir ini, dibahas mengenai kanker hati primer dengan jenis hepatocellular carcinoma. Metode Twin Support Vector Machines (Twin SVM) diimplementasikan untuk mengklasifikasikan data kanker hati berdasarkan hasil CT scan. Data yang digunakan adalah data numerik hasil CT scan pasien yang menderita kanker hati dan diperoleh dari Laboratorium Radiologi RSUPN Cipto Mangunkusumo. Metode Twin SVM adalah pengembangan dari metode SVM yang menggunakan dua hyperplane dalam mengklasifikasikan sampel. Pada tugas akhir ini, kernel yang digunakan pada metode Twin SVM adalah polinomial dan radial basis function (RBF). Berdasarkan hasil perbandingan, klasifikasi data kanker hati menggunakan metode Twin SVM dengan kernel Polinomial menghasilkan akurasi tertinggi sebesar 77,30% pada penggunaan data testing sebesar 10% dan data training 90%. Selain itu, nilai akurasi terendah terdapat pada kernel RBF menghasilkan sebesar 60,10% pada penggunaan data testing sebesar 90% dan data training 10% dan nilai parameter ð¶ = 1. Jika dibandingkan, klasifikasi data kanker hati dengan menggunakan metode Twin SVM dengan kernel polinomial menghasilkan nilai akurasi yang lebih baik.

Liver cancer is the main cause of cancer death in the worldwide. In Indonesia, the incidence rate of liver cancer is the second highest for men, that is 12.4 per 100,000 population with the average death rate is 7.6 per 100,000 population. This final project discusses primary liver cancer with a type of hepatocellular carcinoma. The Twin Support Vector Machines (Twin SVM) method was implemented to classify liver cancer data based on CT scan results. The data used are numerical data from CT scan results of patients suffering from liver cancer and obtained from the Radiology Laboratory of Cipto Mangunkusumo Hospital. The Twin SVM method is the development of the SVM method that uses two hyperplane in classifying samples. In this final project, the kernel used in the Twin SVM method is polynomial and radial basis function (RBF). Based on the comparison results, the classification of liver cancer data using the Twin SVM method with a polynomial kernel produces the highest accuracy of 77.30% on the use of testing data of 10% and training data of 90%. In addition, the lowest accuracy value is found in the RBF kernel resulting in 60.10% on the use of testing data of 90% and training data of 10% and the parameter value of C=1. When compared, the classification of liver cancer data using the Twin SVM method with a polynomial kernel produces better accuracy values.

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Antonius Rangga Hapsoro Wicaksono

Klasifikasi Kanker Serviks Menggunakan Metode Stacking Classifier Random Forest-Decision Tree-Support Vector Machine = Cervical Cancer Classification Using Stacking Classifier Random Forest-Decision Tree-Support Vector Machine

"Kanker merupakan salah satu penyebab kematian utama di dunia, dengan 18,1 juta kasus dan 10 juta kematian pada 2020. Kanker serviks menempati urutan keempat secara global dan kedua di Indonesia. Tingginya angka kematian lebih banyak terjadi di negara berpenghasilan menengah ke bawah karena keterbatasan akses pada pencegahan. Deteksi dini kanker serviks sering sulit dilakukan hingga mencapai stadium lanjut. Salah satu metode deteksi dini adalah menggunakan machine learning. Penelitian ini mengaplikasikan algoritma stacking classifier yang menggabungkan decision tree, support vector machine, dan random forest sebagai first-level learner, serta logistic regression sebagai meta learner, untuk mengklasifikasi pasien kanker serviks. Dataset berasal dari 858 pasien di Hospital Universitario de Caracas, Venezuela. Data dibagi 70% untuk pelatihan dan 30% untuk pengujian, dengan lima percobaan acak. Model menghasilkan akurasi rata-rata 95,03%, precision 99,05%, sensitivity 95,49%, specificity 89,39%, dan G-mean 92,37%. Meskipun stacking ensemble menunjukkan performa yang baik, model tunggal menghasilkan kinerja yang sedikit lebih baik namun tidak signifikan.

Cancer is a leading cause of death worldwide, with 18.1 million cases and 10 million deaths in 2020. In Indonesia, there were 396,914 cases and 235,511 deaths. Cervical cancer is the fourth most common cancer globally and the second most common in Indonesia. Higher death rates occur in low- and middle-income countries due to limited access to preventive measures. Cervical cancer is often difficult to detect until it reaches an advanced stage. This research applies a machine learning approach, using a stacking classifier algorithm that combines decision tree, support vector machine, and random forest models as first-level learners, with logistic regression as the meta learner, to classify patients with and without cervical cancer. The dataset, from the UCI Repository, contains data from 858 patients at risk for cervical cancer at Hospital Universitario de Caracas in Venezuela. The data was split into 70% for training and 30% for testing, with five random trials. The model achieved an average accuracy of 95.03%, precision of 99.05%, sensitivity of 95.49%, specificity of 89.39%, and a G-mean of 92.37%. While the stacking ensemble model performed well, single-classifier models showed slightly better performance, though the difference was not significant."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Fiftitah Repfian Aszhari

"Klasifikasi Data Stroke Menggunakan Random Forest dengan Recursive Feature Elimination" = "Classification of Stroke Data Using Random Forest with Recursive Feature Elimination"

Stroke merupakan salah satu penyakit dengan risiko kematian dan kecacatan yang tinggi. Secara umum, stroke diklasifikasikan menjadi dua jenis, yaitu stroke iskemik dan stroke hemoragik. Klasifikasi jenis stroke secara cepat dan tepat diperlukan untuk menentukan jenis pengobatan dan tindakan yang tepat guna mencegah terjadinya dampak yang lebih fatal pada pasien stroke. Pada penelitian ini, klasifikasi stroke dilakukan menggunakan pendekatan machine learning. Adapun data penelitian yang digunakan adalah data stroke yang terdiri atas pemeriksaan laboratorium. Pada data penelitian tersebut, terdapat berbagai komponen pemeriksaan laboratorium yang dicatat serta memungkinkan adanya suatu pemeriksaan yang kurang relevan atau informatif dalam mengklasifikasi stroke. Apabila data tersebut tidak ditangani, akan mempengaruhi kinerja serta waktu komputasi model dalam mengklasifikasi stroke. Oleh karena itu, pada penelitian ini, Random Forest (RF) dengan seleksi fitur Recursive Feature Elimination (RFE) digunakan dalam mengklasifikasi data stroke. Dengan menerapkan metode tersebut, diperoleh kinerja model yang lebih baik saat melakukan klasifikasi menggunakan sejumlah fitur yang diperoleh dari hasil seleksi fitur, dibandingkan menggunakan keseluruhan fitur dalam data stroke. Selain itu, pada penerapan metode tersebut, diperoleh kinerja model yang baik dalam mengklasifikasi data kelas stroke iskemik, akan tetapi tidak cukup baik dalam mengklasifikasi data kelas stroke hemoragik. Hal ini dikarenakan proporsi jumlah data pada kelas stroke iskemik lebih banyak dibandingkan stroke hemoragik. Dalam hal ini dibutuhkan suatu metode penanganan agar kinerja model tetap optimal dalam mengklasifikasi data kelas stroke iskemik dan stroke hemoragik. Pada penelitian ini, Synthetic Minority Oversampling Technique (SMOTE) digunakan untuk menyeimbangkan kedua kelas data stroke guna memperoleh kinerja model yang optimal dalam mengklasifikasi kedua kelas data stroke. Berdasarkan penerapan metode RF dengan RFE serta SMOTE dalam mengklasifikasi data stroke, diperoleh kinerja model yang lebih baik dibandingkan melakukan klasifikasi pada data stroke yang tidak diseimbangkan dengan SMOTE.

Stroke is one of the diseases with the high risk of death and disability. Stroke generally can be classified into two types, namely ischemic stroke and hemorrhagic stroke. A quick and accurate stroke classification is needed to find the right treatment to prevent a dangerous effect on the stroke patients. In this study, the stroke classification was applied using a machine learning approach. The data used in this study is stroke data that consists of laboratory examinations. The data consists of various laboratory examination components, therefore, it might be possible that some of the components are less relevant and has less informative related in classifying stroke. If the data is not well handled, it might affect the performance and computation time of the model in classifying stroke. Therefore, in this study, Random Forest (RF) with Recursive Feature Elimination (RFE) method is used to classify the stroke data. The result showed that by applying the method in classifying several amounts of features obtained from the feature selection results has better performance rather than classifying the method using all features in stroke data. Moreover, based on applying this method, the result showed that the model has better performance in classifying ischemic stoke class data but not good enough in classifying hemorrhagic stroke class data. This result might occur because the proportion of numbers the ischemic stroke more than hemorrhagic stroke class data. Therefore, the handling method is needed to obtain optimal model performance in classifying ischemic stroke and hemorrhagic stroke class data. In this study, Synthetic Minority Oversampling Technique (SMOTE) is applied to balance the two classes of stroke data so optimal performance of the classification model can be obtained. Based on the application of the RF with RFE methods and SMOTE in the classification of stroke data, better model performance is obtained compared to classifying the stroke data that is not balanced with SMOTE.

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Ilsya Wirasati

Klasifikasi Kanker Hati Menggunakan Convolutional Neural Network dan Gated Recurrent Unit = Classification of Liver Cancer Using Convolutional Neural Network and Gated Recurrent Unit

"Hati adalah salah satu organ yang paling aktif secara metabolik di dalam tubuh dan berfungsi dalam proses homeostatis dan sintetik yang penting untuk kelangsungan hidup manusia. Kanker hati diperkirakan menjadi kanker keenam yang paling sering didiagnosis dan penyebab utama kematian keempat akibat kanker di seluruh dunia pada tahun 2018. Dalam mendeteksi kanker hati, terdapat metode magnetic resonance imaging (MRI) atau computed tomography (CT) yang digunakan. Namun, kurang dari 40% pasien didiagnosis pada tahap awal dan pada kanker hati lanjut hanya pilihan pengobatan paliatif yang tersedia dengan kelangsungan hidup yang buruk. Oleh karena itu, diperlukannya riset-riset terkait metode yang tepat untuk mengklasifikasi kanker hati. Salah satu metode yang dapat digunakan adalah machine learning yang menemukan pola melalui pembelajaran historis dan tren pelatihan data untuk memprediksi karakteristik data baru. Pada tugas akhir ini, dua metode machine learning yang digunakan adalah Convolutional Neural Network (CNN) dan Gated Recurrent Unit (GRU). Keutamaan dari CNN adalah adanya konvolusi yang bertugas untuk mengubah input menjadi sekumpulan fitur melalui filter atau kernel. Sedangkan keutamaan metode GRU adalah adanya update gate dan reset gate yang dapat mengingat informasi penting sebelumnya. Pada tugas akhir ini, CNN digunakan dalam mengekstraksi data citra dan GRU digunakan untuk klasifikasi data citra. Penggabungan metode CNN dan GRU menjadi CNN-GRU bertujuan untuk meningkatkan performa dari CNN dalam mengklasifikasi data citra kanker hati. CNN-GRU menghasilkan nilai akurasi terbesar 81,25% sedangkan CNN menghasilkan nilai akurasi terbesar 77,78% dari lima kali percobaan.

The liver is one of the most metabolically active organs in the body and functions in the homeostatic and synthetic processes essential for human survival. Liver cancer is estimated to be the sixth most frequently diagnosed cancer and the fourth leading cause of cancer death worldwide in 2018. In detecting liver cancer, magnetic resonance imaging (MRI) or computed tomography (CT) methods are used. However, less than 40% of patients are diagnosed at an early stage, and in advanced liver cancer, only palliative treatment options are available with poor survival. Therefore, research is needed regarding the right method to classify liver cancer. One method that can be used is machine learning which finds patterns through historical learning and data training trends to predict the characteristics of new data. In this final project, the two machine learning methods used are Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU). The advantage of CNN is a convolution whose task is to convert the input into a set of features through a filter or kernel. Meanwhile, the advantage of GRU method is that can remember important previous information because GRU has reset and update gate. In this final project, CNN is used in extracting image data and GRU is used for image data classification. The combination of the CNN and GRU methods into CNN-GRU aims to improve the performance of CNN in classifying liver cancer image data. CNN-GRU produced the greatest accuracy value of 81.25% while CNN produced the greatest accuracy value of 77.78% from five experiments."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Terry Argyadiva

Analisis kinerja metode Deep Feature Extraction pada klasifikasi Covid-19 menggunakan Data Citra X-Ray = Performance analysis of Deep Feature Extraction for Covid-19 classification based on X-Ray Image

"Corona Virus Disease atau COVID-19 merupakan sebuah wabah yang ditemukan pada akhir tahun 2019 di provinsi Wuhan, China, yang kemudian menyebar ke seluruh dunia. Reverse Transcription Polymerase Chain Reaction (RT-PCR) kemudian digunakan sebagai metode diagnosa COVID-19. Namun metode RT-PCR memerlukan waktu yang lama dalam proses diagnosa sehingga American College of Radiography (ACR) merekomendasi penggunaan alat radiografi seperti Computed Tomography Scan (CT- Scan) dan X-ray sebagai metode tambahan dalam mendiagnosa COVID-19. X-ray kemudian dipilih sebagai metode tambahan dalam mendiagnosa COVID-19 karena alat yang digunakan lebih fleksibel dan sudah tersebar luas di berbagai klinik kesehatan. Pada penelitian ini, penulis menggunakan pendekatan neural network yaitu Convolutional Neural Network (CNN) untuk metode Deep Feature Extraction dan metode klasifikasi klasik dalam membuat model yang dapat mengklasifikasi paru-paru normal, terjangkit COVID-19, dan pneumonia berdasarkan data citra X-ray. Arsitektur CNN yang digunakan dalam penelitian ini adalah ResNet-50 dan metode klasifikasi klasik yang digunakan adalah Support Vector Machine (SVM), Random forest, K-Nearest Neighbor (KNN), dan Extreme Gradient Boosting (XGBoost). Dataset yang digunakan dalam penelitian ini adalah COVID-19 Image Data Collection oleh J. P. Cohen, ChestX-Ray8 Dataset oleh National Institute of Health, dan Chest X-ray Dataset oleh Mendeley Data. Selanjutnya, model dilatih menggunakan ResNet-50 untuk proses ekstraksi fitur dari fully connected layer. Kemudian, vektor fitur dari fully connected layer diklasifikasi menggunakan metode klasifikasi klasik SVM, Random forest, KNN, dan XGBoost. Berdasarkan hasil simulasi, diketahui akurasi terbaik didapatkan oleh kombinasi antara ResNet-50 dan SVM dengan 94,22%. Recall terbaik didapatkan oleh kombinasi antara ResNet-50 dan KNN dengan 94%. Precision terbaik didapatkan oleh ResNet-50 dengan 94,36%. Running time terbaik didapatkan oleh ResNet-50 dengan 0,0006 detik.

Corona Virus Disease or COVID-19 is an outbreak that was discovered at the end of 2019 in the province of Wuhan, China, which then spread throughout the world. Reverse Transcription Polymerase Chain Reaction (RT-PCR) was then used as a method of diagnosing COVID-19. However, the RT-PCR method requires a long time in the diagnostic process so the American College of Radiography (ACR) recommends the use of radiographic tools such as Computed Tomography Scan (CT-Scan) and X-ray as additional methods in diagnosing COVID-19. X-ray was then chosen as an additional method in diagnosing COVID-19 because the tool used is more flexible and is already widespread in various health clinics. In this study, the author uses a neural network approach, namely the Convolutional Neural Network (CNN) for the Deep Feature Extraction method and the Machine Learning approach for the classification method in making a model that can classify normal lungs, infected with COVID-19, and pneumonia based on X-ray image. The CNN architecture used in this study is ResNet-50 and the Classifier used is Support Vector Machine (SVM), Random forest, K-Nearest Neighbor (KNN), and Extreme Gradient Boosting (XGBoost). The datasets used in this study were the COVID-19 Image Data Collection by J. P. Cohen, the ChestX-Ray8 Dataset by the National Institute of Health, and the Chest X-ray Dataset by Mendeley Data. The model was then trained using the CNN method with the ResNet-50 architecture. Furthermore, the fully connected layer in the ResNet-50 architecture was replaced using the SVM, Random forest, KNN, and XGBoost classifiers. Based on the simulation results, the best accuracy is obtained by combination of ResNet-50 and SVM with 94.22%. The best recall was obtained by a combination of ResNet-50 and KNN with 94%. The best precision was obtained by ResNet-50 with 94.36%. The best running time was obtained by ResNet-50 with 0.0006 seconds."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Gregorius Vidy Prasetyo

Metode easy ensemble dengan random forest untuk mengatasi masalah klasifikasi pada kelas data tidak seimbang = Easy ensemble with random forest to handle imbalanced data in classification

"ABSTRAK

Pada permasalahan seperti kesehatan atau dunia retail banyak dijumpai data-data yang memiliki kategori yang tidak seimbang. Sebagai contoh jumlah penderita penyakit tertentu relatif langka pada suatu studi atau jumlah transaksi yang terkadang merupakan transaksi palsu (fraud) jumlahnya secara signifikan lebih sedikit ketimbang transaksi normal. Kondisi ini biasa disebut sebagai kondisi data tidak seimbang dan menyebabkan permasalahan pada performa model, terutama pada kelas minoritas. Beberapa metode telah dikembangkan untuk mengatasi permasalahan data tidak seimbang, salah satu metode terkini untuk menanganinya adalah Easy Ensemble. Easy Ensemble diklaim dapat mengatasi efek negatif dari pendekatan konvensional seperti random-under sampling dan mampu meningkatkan performa model dalam memprediksi kelas minoritas. Skripsi ini membahas metode Easy Ensemble dan penerapannya dengan model Random Forest dalam mengatasi masalah data tidak seimbang. Dua buah studi empiris dilakukan berdasarkan kasus nyata dari situs kompetisi hacks.id dan kaggle.com. Proporsi kategori antara kelas mayoritas dan minoritas pada dua data di kasus ini adalah 70:30 dan 94:6. Hasil penelitian menunjukkan bahwa metode Easy Ensemble, dapat meningkatkan performa model klasifikasi Random Forest terhadap kelas minoritas dengan signifikan. Sebelum dilakukan resampling pada data (nhacks.id), nilairecall minority hanya sebesar 0.47, sedangkan setelah dilakukan resampling, nilainya naik menjadi 0.82. Begitu pula pada data kedua (kaggle.com), sebelum resampling nilai recall minority hanya sebesar 0.14, sedangkan setelah dilakukan resampling, nilai naik secara signifikan menjadi 0.71.

ABSTRACT

In the real world problem, there is a lot case of imbalanced data. As an example in medical case, total patients who suffering from cancer is much less than healthy patients. These condition might cause some issues in problem definition level, algorithm level, and data level. Some of the methods have been developed to overcome this issues, one of state-of-the-art method is Easy Ensemble. Easy Ensemble was claimed can improve model performance to classify minority class moreover can overcome the deï¬?ciency of random under-sampling. In this thesis discussed the implementation of Easy Ensemble with Random Forest Classifers to handle imbalance problem in a credit scoring case. This combination method is implemented in two datasets which taken from data science competition website, nhacks.id and kaggle.com with class proportion within majority and minority is 70:30 and 94:6. The results show that resampling with Easy Ensemble can improve Random Forest classifier performance upon minority class. This been shown by value of recall on minority before and after resampling which increasing significantly. Before resampling on the first dataset (nhacks.id), value of recall on minority is just 0.49, but then after resampling, the value of recall on minority is increasing to 0.82. Same with the second dataset (kaggle.com), before the resampling, value of recall on minority is just 0.14, but then after resampling, the value of recall on minority is increasing significantly to 0.71."

2019

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Devina Itsnia Rizka

Klasifikasi data kanker serviks menggunakan metode na ve bayes dengan pemilihan fitur artificial bee colony = Cervical cancer classification using na ve bayes method with artificial bee colony as feature selection

"ABSTRAK

Kanker serviks merupakan salah satu jenis kanker yang berbahaya. Berdasarkan data dari Departemen Kesehatan Republik Indonesia Depkes RI , kanker serviks merupakan salah satu penyakit kanker dengan prevelensi tertinggi sebesar 0.8 di Indonesia. Maka dari itu diperlukan tindakan pendeteksian dini dengan menggunakan microarray dataset. Microarray dataset mempunyai jumlah fitur yang banyak tetapi tidak semua fitur yang ada relevan dengan data yang digunakan. Oleh karena itu, perlu dilakukan pemilihan fitur untuk meningkatkan akurasi. Pemilihan fitur yang digunakan adalah Artificial Bee Colony ABC . Setelah dilakukan pemilihan fitur, akan dilakukan klasifikasi menggunakan metode klasifikasi Na ve Bayes. Hasilnya, didapatkan akurasi terbaik klasifikasi Na ve Bayes tanpa pemilihan fitur adalah 60 pada saat data training 90 dan untuk klasifikasi Na ve Bayes dengan menggunkan pemilihan fitur Artificial Bee Colony didapatkan akurasi tertinggi adalah 93.33333 . dengan fitur sebanyak 50 dan data training 90

ABSTRACT

Cervical cancer is one of the most dangerous cancer. Based on data from Departemen Kesehatan Republik Indonesia Depkes RI , cervical cancer is one of the diseases with the highest prevalence of 0.8 in Indonesia. Therefore, early detection action is needed with using microarray dataset. Microarray datasets have a large number of features but not all features are relevant to the data is used. Therefore, feature selection is needed to improve the accuracy. The feature selection that used is Artificial Bee Colony ABC . After feature selection process is done, Naive Bayes classification method will be implemented for classification process. As a result, the best accuracy of Na ve Bayes classification without feature selection is 60 with 90 training data and for Na ve Bayes classification using Artificial Bee Colony feature selection is 93.33333 with using 50 features selection and 90 training data."

2017

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Nathanael Matthew

Metode Robust untuk Mendeteksi Pothole Menggunakan Model Klasifikasi Random Forest = Robust Random Forest to Detect Potholes

"Smartphone telah dikembangkan sebagai alat deteksi pothole oleh berbagai penelitian karena potensinya dalam memberikan manfaat pengumpulan data secara crowdsourcing tanpa memerlukan suatu infrastruktur khusus dan mahal. Namun, metode deteksi pothole berbasis smartphone memiliki tantangan dalam menghadapi berbagai ketidakpastian intrinsik dalam mengukur sinyal yang dihasilkan oleh perangkat smartphone berbeda. Ketangguhan metode dalam menghadapi ketidakpastian intrinsik tersebut diperlukan agar potensi pengumpulan data secara crowdsourcing dapat tercapai. Meskipun telah banyak penelitian yang menghasilkan kinerja deteksi yang memuaskan, berbagai macam faktor ketidakpastian masih mencegah ketangguhan penuh dari metode deteksi pothole tersebut. Penelitian menanggapi faktor-faktor ketidakpastian potensial sebagai faktor prediktor dalam mengembangkan model deteksi berbasis algoritma Random Forest dengan memanfaatan sudut Euler untuk menyelaraskan percepatan akselerometer terhadap percepatan vektor gravitasi; menerapan profil matriks untuk mengurangi kesalahan pelabelan pothole dan memberikan apriori untuk klasifikasi secara efisien; dan diskritisasi temporal pada data sensor dengan penghalusan data tersegmentasi berdasarkan jarak roda platform deteksi (Zona Deteksi). Ketangguhan metode dibuktikan dengan eksperimen faktorial bertingkat dengan variasi spesifikasi perangkat sensor, variasi rute dan tingkatan pothole, serta variasi ketersediaan sensor. Eksperimen membuktikan bahwa faktor-faktor ketidakpastian memiliki efek signifikan secara statistik, namun tidak mempengaruhi kinerja model-model yang dihasilkan. Selain tangguh, kinerja model klasifikasi yang dihasilkan menunjukkan hasil serupa atau bahkan lebih baik dari metode lain yang ada saat ini.

Smartphones have been developed as a pothole detection tool by various studies due to their potential in providing crowdsourced data collection without the need for special and expensive infrastructure. However, a reliable smartphone-based pothole detection method is challenging to develop due to various uncertainties in measuring the signal generated by different smartphone devices. A robust method is needed to deal with said uncertainties so crowdsourced data collection potential can be achieved. Although many studies have yielded satisfactory performance, various uncertainty factors still prevent the full robustness of the existing pothole detection methods. This study endeavors to address the potential uncertainty factors as predictors in developing a pothole detection model with Random Forest algorithm. This is done by incorporating Euler angles to align the relevant sensor data to gravitational vector acceleration; matrix profile to reduce pothole labeling errors and provide a priori for efficient classification; and temporal discretization of sensor data with data segment-smoothing based on detection platform wheelbase (Detection Zone). The robustness of the proposed method is proven using multilevel factorial experiment with variations of sensor device specifications, variations in routes and levels of potholes, and variations in sensor availability. The conducted experiment proves the statistical significance of the simulated uncertainty factors does not affect the performance of the resulting models. Besides showing robustness, the performance of the resulting classification models shows promising results that are comparable to or better than other currently available smartphone-based pothole methods."

Depok: Fakultas Teknik Universitas Indonesia, 2022

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Arvan Aulia Rachman

Klasifikasi data kanker menggunakan fuzzy c-means dengan pemilihan fitur menggunakan fisher's ratio = Classification of cancer data using fuzzy c means with feature selection using fisher's ratio

"Klasifikasi data kanker dilakukan untuk menemukan terapi yang tepat yaitu memaksimalkan efektivitas dan meminimalkan toksisitas. Pada umumnya, data kanker terdiri dari banyak fitur. Namun, tidak semua fitur tersebut informatif. Oleh karena itu, fitur-fitur tersebut akan diseleksi menggunakan metode Fisher's Ratio untuk memilih fitur-fitur yang paling informatif. Fitur-fitur terbaik akan dibentuk data baru. Data, sebelum dan setelah dilakukan pemilihan fitur, diklasifikasi menggunakan metode Fuzzy C-Means. Akurasi dari proses klasifikasinya akan dibandingkan. Hasilnya, tanpa melakukan pemilihan fitur, diperoleh rata-rata akurasi sebesar 82.92%. Setelah dilakukan pemilihan fitur, diperoleh akurasi terbaik dengan menggunakan 150 fitur dengan rata-rata akurasi sebesar 89.68%.

Classification of cancer data is done to find the right therapy that maximize efficacy and minimize toxicity. In general, cancer data consists of many features. However, not all of these features are informative. Therefore, these features will be selected using Fisher's Ratio to choose features that are most informative. The best features to be formed new data. Data, before and after feature selection, are classified using Fuzzy C-Means. The accuracy of the classification process will be compared. As a result, without doing feature selection, the accuracy is 82.92%. After doing feature selection, the best accuracy is obtained by using 150 features with the accuracy is 89.68%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2016

S64140

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian