Hasil Pencarian

Ditemukan 138047 dokumen yang sesuai dengan query

Aris Surya Yunata

Analisis Pengaruh Noise terhadap Performa K-Nearest Neighbors Algorithm dengan Variasi Jarak untuk Klasifikasi Beban Listrik = Analysis of the Effect of Noise on the Performance of the K-Nearest Neighbors Algorithm with Distance Variations in Electrical Load Classification

"Teknik NILM (Non-Intrusive Load Monitoring) digunakan dalam pemantauan konsumsi energi. Penerapan NILM digunakan untuk energi efisiensi, manajemen energi, dan diagnosa peralatan di rumah tangga, industri atau penyedia energi. Variabel pengukuran yang digunakan yaitu daya aktif dan daya reaktif. Namun, data pengukuran sering kali terpengaruh oleh noise. Berbgai macam metode digunakan dalam NILM. Metode K-NN adalah salah satu metode machine learning yang banyak digunakan untuk klasifikasi beban listrik dalam teknik NILM dengan performa yang baik dan bersaing dengan metode lain yang lebih kompleks. Penelitian ini bertujuan untuk menganalisis pengaruh noise terhadap performa algoritma k- Nearest Neighbors (K-NN) dalam klasifikasi beban listrik. Berbagai tingkat noise secara rundom diberikan pada data pengukuran yang diperoleh sebesar 1% hingga 20%. Selanjutnya, model K-NN dilatih dan dievaluasi dengan nilai k = 1 sampai 25 dan menggunakan 15 variasi jarak. Dalam penelitian ini bahasa pemograman python digunakan untuk mengevaluasi performa K-NN. Hasil eksperimen menunjukkan bahwa penambahan noise pada data pengukuran secara signifikan mempengaruhi performa algoritma K-NN dalam mengklasifikasikan beban listrik. Pengaruh ini terlihat pada nilai akurasi, presisi dan recall. Performa K-NN menurun hingga 15% yang didapatkan dari perbandingan nilai akurasi untuk data yang diberikan noise 20%. Nilai k yang memberikan akurasi maksimal k = 25 untuk data yang diberikan noise. Nilai k yang memberikan presisi dan recall maksimal bernilai k = 3 untuk data yang diberikan noise. Dari ke 15 jenis jarak yang dipakai di metode K-NN pada penelitian ini, jarak Clark dan Divergence yang memiliki nilai akurasi diatas ratarata, jarak Camberra memiliki nilai presisi di atas rata-rata dan jarak Neyman Chi Squared memiliki nilai recall diatas rata-rata. Perbandingan performa antara metode K-NN dengan Random Forest dan Extra Trees Classifier juga telah dilakukan. Dari hasil pengujian dan penelitian didapatkan bahwa dengan metode K-NN memberikan performa yang baik untuk mendisaggregasi data yang diberikan noise besar dibandingkan dengan metode Random Forest dan Extra Trees Classifier. metode K-NN memiliki nilai akurasi lebih tinggi dibandingkan dengan metode Random Forest dan Extra Trees Classifier, Selisih yang dihasilkan mencapai 15%.

The NILM (Non-Intrusive Load Monitoring) technique is used in monitoring energy consumption. NILM applications include energy efficiency, energy management, and appliance diagnostics in households, industries, or energy providers. Measurement variables used are Real Power and Reactive Power. However, measurement data are often affected by noise. Appliance diagnosis uses various machine learning methods. The K-NN method is one of the widely used machine learning methods for classifying electrical loads with good performance, competing even with more complex methods. Python has become a mainstay in data science. This programming language enables data analysis to perform machine learning algorithms. This study aims to analyze the impact of noise on the performance of the k-Nearest Neighbors (K-NN) algorithm in classifying electrical loads. Various noise levels, ranging from 1% to 20%, were randomly added to the measurement data obtained. Subsequently, the K-NN model was trained and evaluated with k values ranging from 1 to 25, using 15 distance variations. Experimental results showed that the addition of noise to the measurement data significantly affected the performance of the K-NN algorithm in classifying electrical loads. This impact is reflected in the values of accuracy, precision, and recall. K-NN performance decreased by up to 15%, as indicated by the accuracy comparison for data with 20% noise. The k value providing maximum accuracy was k = 25 for both low and high noise data. The k value providing maximum precision and recall was k = 3 for both low and high noise data. Among the 15 types of distances used in the K-NN method in this study, Clark and Divergence distances had above-average accuracy values, Camberra distance had above-average precision values, and Neyman Chi-Squared distance had above-average recall values. Testing and research results showed that the K-NN method performs well in disaggregating data with high noise compared to the Random Forest and Extra Trees Classifier methods."

Depok: Fakultas Teknik Universitas Indonesia, 2024

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Raynaldi Suhaili

Pengenalan wajah menggunakan algoritma k-nearest neighbors (KNN) dengan ektraksi feature berdasarkan singular value decomposition (SVD) = Face recognition using k-nearest neighbors (KNN) algorithm with feature extraction based on singular value decomposition (SVD) / Raynaldi Suhaili

"ABSTRAK

Dalam beberapa tahun terakhir, kemajuan besar telah terjadi pada sistem pengenalan wajah. Banyak model yang telah diusulkan. Pada penelitian ini, uji coba dilakukan dengan model tertentu. Teknik Logarithm Transformation pertama-tama diterapkan untuk meningkatkan kualitas gambar wajah dan mengatasi variasi pencahayaan. Selanjutnya dilakukan proses ekstraksi fitur wajah dari gambar berdasarkan Singular Value Decomposition SVD . Nilai singular diambil sebagai fitur yang diasumsikan merepresentasikan gambar citra wajah. Kemudian, algoritma K-Nearest Neighbors KNN dijalankan untuk proses klasifikasi, sehingga menghasilkan persentase tingkat akurasi program. ORL faces database dipilih untuk menguji model program pengenalan wajah. Dalam penelitian ini, data uji menggunakan hasil ektraksi fitur SVD dibandingkan dengan data uji tanpa ekstraksi fitur. Dari hasil uji coba, diperoleh bahwa penggunaan data uji menggunakan hasil ekstraksi fitur SVD menghasilkan proses running time yang lebih cepat dibandingkan dengan menggunakan data tanpa ekstraksi fitur. Namun persentase tingkat akurasi rata-rata tertinggi yang didapatkan pada setiap iterasi terpilih, lebih baik hasilnya dengan data uji tanpa ektraksi fitur, yaitu sebesar 98,34 pada 90 data training, dibandingkan dengan data uji hasil ektraksi fitur SVD yang memperoleh persentase tingkat akurasi rata-rata sebesar 82,82 pada 90 data training.

ABSTRACT

In the past several years, major advances have occurred in face recognition system. Many models have been proposed. In this paper, the experiments were carried out with a particular model. The Logarithm Transformation LT technique is firstly applied to enhance the face image and handling lighting variations of face image. Furthermore, extract the feature of the face image based on Singular Value Decomposition SVD . The singular value is taken as a feature that is assumed to represent the face image. Then, K Nearest Neighbors KNN algorithm is run for the classification process, so it generates an accuracy of program. ORL database was chosen to test the model of face recognition program. In this research, data using the feature extraction were compared to the data without feature extraction. From the test results, it was found that the use of test data using feature extraction has a faster running time than using data without feature extraction. However, the highest rate of average accuracy that obtained on each chosen iteration, the result is better with the test data without feature extraction, that is 98.34 at 90 data training, compared to the test data using feature extraction which has average accuracy level of 82.82 at 90 of data training."

2017

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

I Ketut Agung Enriko

Desain dan implementasi sistem machine-to-machine (M2M) pada pasien penyakit kardiovaskuler dengan fitur auto-rekomendasi menggunakan algoritma k-nearest neighbors (kNN) = Design and implementation of machine-to-machine (M2M) for cardiovascular patients with auto-recommendation system using k-nearest neighbor knn algorithm

"ABSTRAK

Penyakit kardiovaskuler adalah penyakit serius yang mematikan di mana seperempat kematian yang terjadi ternyata disebabkan oleh penyakit ini. Sementara itu, di negara berkembang seperti Indonesia kualitas layanan kesehatan masih rendah, ditandai dengan kurangnya tenaga dokter pada daerah-daerah rural dan terpencil. Kondisi ini menjadi motivasi perlunya merancang inovasi teknologi telemedical yang berfungsi membantu dokter melakukan diagnosis dan pengobatan penyakit kardiovaskuler. Penelitian ini mengusulkan sebuah sistem berbasis teknologi machine-to-machine M2M untuk mengecek kesehatan pasien yang akan melaporkan hasilnya ke dokter jantung secara jarak jauh melalui aplikasi website dan aplikasi mobile, yang diberi nama My Kardio. Desain dari sistem ini adalah terdiri dari tiga bagian utama yaitu bagian pasien patient site yang terdiri dari sensor-sensor dan gateway, bagian server server site yaitu server aplikasi web dan mobile yang terletak di cloud internet, dan bagian dokter doctor site yaitu aplikasi web dan mobile untuk dokter agar dokter dapat melakukan pengecekan dan diagnosis terhadap pasien secara online. Sistem ini dilengkapi dengan sistem prediksi auto-rekomendasi untuk memberikan rekomendasi kepada dokter dalam menentukan diagnosis penyakit yang diderita pasien. Sistem auto-rekomendasi ini dibangun dengan algoritma k-Nearest Neighbors kNN yang terbukti cukup baik performansinya dalam hal akurasi dan kecepatan. Uji coba telah dilakukan pada empat lokasi di daerah pinggiran Jakarta yaitu Kampung Banjarsari 10 pasien , Cibubur 15 pasien , Cimanggis 37 pasien , dan Pancoran 23 pasien pada total sejumlah 85 pasien. Evaluasi kuantitatif menghasilkan rata-rata akurasi prediksi sistem auto-rekomendasi adalah 76,47 , waktu pemrosesan sistem auto-rekomendasi 1 detik, dan performansi waktu transfer data dari lokasi pemeriksaan ke server M2M adalah 8,97 detik. Evaluasi secara kualitatif dilakukan melalui wawancara dokter spesialis jantung, dan diperoleh hasil bahwa aplikasi My Kardio sangat membantu terutama untuk daerah-daerah yang kekurangan dokter spesialis jantung; dan juga bermanfaat untuk kota besar di mana akses pasien ke dokter jantung juga terkendala oleh waktu praktek dokter yang terbatas dan kemacetan. Kata kunci:Machine-to-machine, penyakit kardiovaskuler, k-Nearest Neighbors.

ABSTRACT

Cardiovascular disease is a deadly disease which one-fourth of deaths are caused by this disease. Meanwhile, in developing country like Indonesia, the quality of health services is still low, marked by the lack of doctors to serve patients. This condition gives the motivation about the need for a new innovation to improve the life expectancy of cardiovascular patients in Indonesia, with the help of technology. This research proposes a machine-to-machine M2M technology-based system to check the health of patients which will report the results to the cardiologist remotely through the web and mobile applications, named My Kardio. The design of this system is composed of three main parts, the first one is patient site consisting of sensors and gateways, then server site which is web and mobile application server located in the Internet cloud, and the last is doctor site: web and mobile application for doctors to enable doctors checking and diagnosing patients online. The system is equipped with an auto - recommendation prediction system to provide recommendations to physicians in determining the diagnosis of illness suffered by the patient. This auto-recommendation system is built on the k-Nearest Neighbors kNN algorithm that has been proven with good accuracy and fast. Trials have been performed in four locations in the suburbs of Jakarta: Kampung Banjarsari 10 patients , Cibubur 15 patients , Cimanggis 37 patients , and Pancoran 23 patients of the total 85 patients. Quantitative analysis results are, first the prediction accuracy of the auto- recommendation system is 76.47 on average, then the processing time of auto- recommendation system is 1 second, and last, the duration of data transfer time from location to M2M server is 8.97 seconds. Qualitative analysis was made with cardiologists interviews, which results that My Kardio application is very helpful especially in remote areas which lacking of cardiologists, even for big cities where patients rsquo; access to cardiologists is a problem due to limited clinic time and traffic jams. "

Depok: Fakultas Teknik Universitas Indonesia, 2018

D2486

UI - Disertasi Membership Universitas Indonesia Library

Mufarrido Husnah

Klasifikasi sekuens protein coronavirus menggunakan Metode K-Nearest Neighbor dan seleksi fitur algoritma genetika = Classification of coronavirus protein sequences using K-Nearest Neighbor method and feature selection genetic algorithm

"Coronavirus (CoV) adalah keluarga virus penyebab penyakit sistem pernapasan ringan hingga berat pada berbagai spesies hewan termasuk manusia. Salah satu spesies Coronavirus yang muncul pada akhir tahun 2019 yaitu SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) dan menimbulkan penyakit baru bernama Covid-19 (Coronavirus disease-2019) kemudian berstatus pandemi. Penyebaran Covid-19 yang cepat dan dengan tingkat kematian yang tinggi terus terjadi di berbagai negara. Oleh karena itu, deteksi dini patogen perlu dilakukan secara cepat dengan menggunakan data sekuens protein Coronavirus. Sekuens protein merupakan data struktur primer dari suatu protein yang memiliki 27 fitur berdasarkan discere. Dalam penerapannya, tidak semua fitur relevan dengan data yang digunakan sehingga perlu seleksi fitur untuk menghindari dimensi data yang tinggi dan tidak optimal. Seleksi fitur algoritma genetika memberikan fitur-fitur optimal pada data dan metode K-Nearest Neighbor (KNN) melakukan klasifikasi data sekuens protein Coronavirus dengan fitur hasil seleksi fitur algoritma genetika. Seleksi fitur algoritma genetika menghasilkan 11 fitur optimal yang meningkatkan performa running time metode klasifikasi KNN menjadi 0,0541 detik. Fitur optimal diperoleh dari karakteristik AA-count , secondary structure fraction , isoelectric point dan instability index. Hasil terbaik performa akurasi, spesifisitas beserta sensitifitas secara berurutan yaitu 96,68%, 98,7% dan 94,4% yang diperoleh pada nilai parameter K=3.

Coronaviruses (CoV) are a family of viruses that cause mild to severe respiratory system diseases in various animal species including humans. One of the Coronavirus species that emerged at the end of 2019 was SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) and caused a new disease called Covid-19 (Coronavirus disease-2019) then had a pandemic status. The rapid spread of Covid-19 and with a high death rate continues to occur in most of countries. Therefore, early detection of pathogens needs to be done quickly using Coronavirus protein sequence data. Protein sequences are primary structural data of a protein that has 27 features but not all of the existing features are relevant to the data used, so feature selection is necessary to avoid high and suboptimal data dimensions. The genetic algorithm feature selection provides optimal features to the data and the K-Nearest Neighbor (KNN) method performs the classification of Coronavirus protein sequences data with features resulting from the genetic algorithm feature selection. The genetic algorithm feature selection produces 11 optimal features that improve the running time performance of the KNN classification method. The average result of running time is 0.0541 second. The best results were accuracy performance, specificity and sensitivity are 96.68%, 98.7% and 94.4% respectively which were obtained at the parameter value K=3."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Sheila Nuur Ditrie

Klasifikasi Gangguan Depresi pada Sinyal Elektroensefalografi (EEG) Menggunakan Algoritma K-Nearest Neighbor (KNN) = Classification of Major Depressive Disorder On Electroencephalography (EEG) Using K-Nearest Neighbor (KNN) Algorithm

"Penderita gangguan depresi semakin meningkat setiap tahunnya, terutama pada generasi muda. Hal ini membawa urgensi tentang pentingnya menjaga kesehatan mental, terlebih lagi WHO melaporkan bahwa depresi sangat mempengaruhi kualitas hidup dan menjadi penyebab dari meningkatnya risiko gangguan kesehatan lainnya. Kesalahan diagnosis seringkali terjadi pada depresi, maka dari itu sangat penting untuk mengembangkan pendekatan objektif untuk membantu dokter mendiagnosis depresi secara lebih efektif. Elektroensefalografi (EEG) merupakan teknologi berbasis sinyal otak yang dapat merekam aktivitas jaringan otak. Penelitian ini bertujuan untuk membuat program analisis gangguan depresi berbasis Machine Learning. Aplikasi Graphical User Interface (GUI) juga dibuat untuk mempermudah pengguna. Pemrosesan sinyal dilakukan dengan dua metode, yakni wavelet dan Power Spectral Density (PSD). Relative Power Ratio (RPR) dihitung sebagai fitur klasifikasi. Perhitungan dominansi juga dilakukan untuk mereduksi jumlah fitur. Fitur dengan dominansi tertinggi akan digunakan untuk membuat model klasifikasi Machine Learning. Pengklasifikasi yang digunakan adalah K-Nearest Neighbor (KNN) dengan cross validation. Akurasi tertinggi yang diperoleh mencapai 70% dengan metode wavelet dan 65% dengan metode PSD.

The number of individuals suffering from depressive disorder (also known as major depressive disorder or MDD) is increasing every year, especially among the younger generations. This highlights the urgency of prioritizing mental health, especially considering the World Health Organization’s report that depression significantly affects the quality of life and increases the risk of other health disorders. Misdiagnosis often occurs in cases of depression, making it crucial of develop an objective approach to help doctors diagnose depression more affectively. Electroencephalography (EEG) is a brain signalbased technology that records brain network activity. This research aims to create a machine learning-based program for analyzing depressive disorders. Additionally, a Graphical User Interface (GUI) application is developed to facilitate users. Signal processing is performed using two methods, namely wavelet and Power Spectral Density (PSD). The Relative Power Ratio (RPR) is calculated as a classification feature. Dominance computation is also conducted to reduce the number of features, and the feature with highest dominance are used to create the Machine Learning classification model. The classifier used is K-Nearest Neighbor (KNN) with cross-validation. The highest accuracy achived is 70% with the wavelet method and 65% with the PSD method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Velery Virgina Putri Wibowo

Klasifikasi tumor otak menggunakan Mmetode K-Nearest Neighbor-Genetic Algorithm dan Support Vector Machine-Genetic Algorithm = Classification of brain tumor using K-Nearest Neighbor-Genetic Algorithm and Support Vector Machine-Genetic Algorithm methods

Kemunculan suatu penyakit merupakan masalah yang tak terhindarkan di seluruh dunia, termasuk di Indonesia. Tumor otak merupakan salah satu penyakit berbahaya yang dapat menyebabkan kematian. Salah satu jenis penyakit tumor otak yang paling umum dan mematikan adalah glioblastoma. Penderita glioblastoma memiliki tingkat kelangsungan hidup yang cukup rendah dan umumnya didiagnosis pada saat tumor sudah berkembang lebih jauh. Oleh karena itu, sangat penting dilakukan diagnosis secara dini dengan hasil yang akurat untuk menentukan apakah seseorang menderita glioblastoma atau tidak. Pada penelitian ini, metode machine learning, yaitu K-Nearest Neighbor dan Support Vector Machine dengan seleksi fitur Genetic Algorithm (KNN-GA dan SVM-GA) diterapkan dan dibandingkan untuk mengklasifikasi glioblastoma. Genetic Algorithm (GA) diimplementasikan sebagai seleksi fitur untuk menentukan fitur-fitur relevan yang terpilih dan kemudian diklasifikasi dengan metode KNN dan SVM. Data yang digunakan adalah data numerik hasil Magnetic Resonance Imaging (MRI) yang didapat dari RSUPN Dr. Cipto Mangunkusumo. Berdasarkan percobaan yang dilakukan, metode SVM-GA menggunakan kernel Radial Basis Function dan 5 fitur dengan 90% data training adalah metode terbaik untuk mengklasifikasi data glioblastoma. Hasil yang didapat untuk nilai akurasi, recall, presisi, dan f1-score secara berturut-turut adalah 92.35%, 93.19%, 92.62%, dan 92.83%.

The emergence of a disease is an inevitable problem throughout the world, including in Indonesia. Brain tumor is one of the dangerous diseases that can cause death. One of the most common and deadly types of brain tumor is glioblastoma. Patients with glioblastoma have a fairly low survival rate and are generally diagnosed when the tumor has developed further. Therefore, it is very important to make an early diagnosis with accurate result to determine whether a person has glioblastoma or not. In this study, machine learning methods, namely K-Nearest Neighbor and Support Vector Machine with feature selection Genetic Algorithm (KNN-GA and SVM-GA) were applied and compared to classify glioblastoma. Genetic Algorithm (GA) was implemented as a feature selection to determine the selected relevant features and then classified by KNN and SVM methods. The data used are numerical data obtained from Magnetic Resonance Imaging (MRI) results from Dr. Cipto Mangunkusumo Hospital. Based on the experiments conducted, the SVM-GA method using a Radial Basis Function kernel and 5 features with 90% training data is the best method for classifying glioblastoma. The results obtained for the values of accuracy, recall, precision, and f1-score were 92.35%, 93.19%, 92.62%, and 92.83%, respectively."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Valentinus Paramarta

Perbandingan kinerja Support Vector Machine (SVM), Decision Tree, dan k-Nearest Neighbors untuk memprediksi adiksi internet dan kondisi kesehatan mental: studi kasus mahasiswa Saintek Universitas Indonesia = Performance comparison of Support Vector Machine (SVM), Decision Tree, and k-Nearest Neighbors for internet addiction and mental health status prediction: a study case of STEM Students at Universitas Indonesia

"Semakin tinggi penetrasi penggunaan Internet seseorang, maka akan semakin berpotensi terkena Gangguan Adiksi Internet (GAI) yang dapat berdampak buruk pada status kesehatan mental penggunanya. Mayoritas penduduk Indonesia telah menggunakan layanan Internet selama 2 sampai 3 tahun dengan penggunaan rata-rata di atas 8 jam

perhari. Hal tersebut menunjukkan penggunaan Internet dan potensi dampaknya pada kesehatan mental di Indonesia penting untuk diperhatikan sedini mungkin. Penelitian lain menunjukkan bahwa tingkat kesehatan mental yang dialami seseorang dapat mempengaruhi perilaku penggunaan Internetnya, sehingga menyebabkan munculnya keinginan yang tidak terkendali dan berlebihan dalam pengaksesan Internet. Secara tidak langsung, hal tersebut menyatakan bahwa kesehatan mental seseorang juga dapat diamati melalui tingkah laku serta kebiasaan seseorang dalam menggunakan Internet. Prediksi GAI dan gangguan kesehatan mental mahasiswa UI dilakukan dengan menggunakan algoritma pemelajaran mesin Support Vector Machine (SVM) berdasarkan perilaku penggunaan Internet yang dilakukan. Sampel diambil dari mahasiswa UI rumpun Ilmu Saintek (Ilmu Komputer, Teknik, dan MIPA). Data yang diambil adalah riwayat penulusuran halaman website yang diakses oleh mahasiswa dan hasil kuesioner Internet addiction test (IAT) dan General Health Questionnaire (GHQ-12). Riwayat penelusuran website dijadikan himpunan fitur yang merepresentasikan perilaku penggunaan Internet responden, sedangkan hasil skor kuesioner IAT dan GHQ-12 digunakan untuk menjadi ground truth atau label pada dataset. Tahapan preprocessing yang dilakukan adalah metode Synthetic Minority Over-Sampling Technique (SMOTE) untuk mengatasi ketidak seimbangan persebaran data pada kelas data yang digunakan. Metode SVM selanjutnya dibandingkan dengan performa lainnya seperti Decision Tree dan k-Nearest Neighbor (kNN). Untuk meningkatkan performa akurasinya, peneliti menggunakan metode grid search untuk mendapatkan parameter terbaik. Proses validasi dilakukan menggunakan cross-validation pada metode grid search. Hasil yang didapatkan menunjukkan bahwa performa akurasi tertinggi pada SVM untuk memprediksi GAI adalah 88% pada dataset kedua. Saat dilakukan perbandingan hasil dengan metode pemelajaran mesin Decision Tree dan kNN, didapatkan performa nilai akurasi tertinggi dicapai pada metode Decision Tree dengan nilai akurasi sebesar 96%. Sedangkan untuk prediksi gangguan kesehatan mental, metode SVM mendapatkan nilai performa akurasi tertinggi sebesar 71% pada dataset gabungan. Saat dilakukan perbandingan hasil performa akurasi dengan Decision

Tree dan kNN, didapatkan nilai performa akurasi tertinggi dicapai pada metode kNN sebesar 72%. Hasil penelitian ini menunjukkan bahwa metode grid search meningkatkan performa SVM, Decision Tree, dan kNN karena adanya perubahan nilai parameter.

Excessive internet usage lead to potential Internet Addiction Disorders (IAD) which affect user`s mental health. The mayority of Indonesian people have been used Internet services for 2 until 3 years in their lives with an average use of above 8 hours per day. It shows that an increase of internet usage has a positive potential impact to an increase in mental disorder. Other research shows that the level of mental health experienced by a person can influence his Internet usage behavior, thus causing an uncontrolled and excessive desire to access the Internet. It could be concluded that the mental health can also be observed through one`s behavior and habits in using the Internet. This study predicts the internet addiction disorder (IAD) and mental health disorder status of UI students by using machine learning based on Support vector Machine (SVM) algorithm. This study used behaviour of internet usage for the input. Samples used in this study were taken from Universitas Indonesia`s students with Science and Technology background. The data collection period was set before and after the exam period. Data collected in this study included history of website accessed by students and questionnaires based on Internet addiction test (IAT) and General Health Questionnaire (GHQ-12). Student`s website history would be used as feature data set that represent user internet usage behavior, while the IAT and GHQ-12 questionnaires results were used as the label. The preprocessing stage was carried out using Synthetic Minority Over-Sampling Technique (SMOTE) method to overcome the imbalance of data distribution in class used. Then, student`s website history would be analyzed using machine learning based on SVM algorithm to predict IAT and mental health status. This study also compared other algorithms such as Decision Tree and k-Nearest Neighbor (kNN). The optimization of machine learning model was conducted using grid search method to obtain the best
parameters. The validation of the model would be carried out using the cross-validation obtained from grid search method. Based on the results obtained, it shows that the highest accuracy for predicting internet addiction was obtained from SVM algorithm with 88% accuracy for the second dataset. Comparison with other models showed that Decision Tree obtained the highest accuracy value of 96% for predicting internet addiction. For the prediction of mental health disorder, SVM algorithm obtained the highest accuracy than Decision Tree or kNN. The SVM algorithm can predict with accuracy of 71% with combined dataset. When comparing the accuracy result with the accuracy of Decision Tree and kNN, the highest accuracy value of 72% was achieved by kNN method. The optimal value of accuracy is obtained when the grid search method is performed. The results of this study indicate that the grid search method has succeeded in improving the performance of SVM, Decision Tree, and kNN due to parameter value changes."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2020

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Annisa Kamalia

Klasifikasi data talasemia menggunakan K-nearest neighbor dan naive bayes = Classification of data thalassemia using K-nearest neighbor and naive bayes

"ABSTRACT

Talasemia adalah penyakit yang disebabkan oleh adanya kelainan dalam hemoglobin. Penyakit talasemia merupakan penyakit herediter atau penyakit keturunan dimana pembawa gen talasemia adalah orang tua dari penderita. Di Indonesia, pada tahun 2015 diketahui jumlah kasus talasemia mencapai 7.029 kasus. Sampai saat ini talasemia belum dapat disembuhkan namun dapat dikenali sifat pembawanya dengan skrining. Dalam tugas akhir ini, akan dibandingkan performa dari dua metode yang digunakan untuk mengklasifikasikan data talasemia, yaitu K-Nearest Neighbor dan Naive Bayes. Data yang digunakan adalah 82 data pasien talasemia dan 68 data pasien non-talasemia dari Rumah Sakit Anak dan Bunda Harapan Kita, Jakarta Barat. Hasil akhir menunjukkan bahwa metode Naive Bayes memberikan nilai akurasi yang lebih besar dari K-Nearest Neighbor dalam mengklasifikasikan talasemia. Rata-rata akurasi Naive Bayes sebesar 99.775% dengan rata-rata waktu running 0.0554 detik dan rata-rata akurasi K-Nearest Neighbor adalah 97.142% dengan rata-rata waktu running 0.081 detik. Untuk nilai spesifikasi, keduanya memberikan performa yang sama, yaitu dari K-Nearest Neighbor diperoleh ketika K=3 yaitu sebesar 100% dan dari Naive Bayes sebesar 100%. Hasil rata-rata sensitivitas tertingi diberikan oleh Naive Bayes yaitu sebesar 99.59%, sedangkan K-Nearest Neighbor sebesar 96.25% untuk K=1.

ABSTRACT

Thalassemia is a disease caused by abnormalities in the hemoglobin. Thalassemia is a hereditary disease which the thalassemia gene carriers are parents of sufferers. In Indonesia, in 2015 it was found that the number of thalassemia cases reached 7,029 cases. Until now thalassemia has not been cured, but it can be recognized the nature of its carrier by screening. In this final project, the performance of the two methods will be compared to classify thalassemia data, namely K-Nearest Neighbor and Naive Bayes. The data used were 82 data on thalassemia patients and 68 data on non-thalassemia patients from Harapan Kita Children and Womans Hospital, West Jakarta. The final results show that the Naive Bayes method provides greater accuracy value than K-Nearest Neighbor in classifying thalassemia. The average accuracy of Naive Bayes is 99.775% with an average running time of 0.0554 seconds and the average accuracy of K-Nearest Neighbor is 97.142% with an average running time of 0.081 seconds. For specification values, both give the same performance. The result of specification values using K-Nearest Neighbor yield when K = 3 that is 100% and from Naive Bayes that is 100%. The highest average sensitivity results are given by Naive Bayes is 99.59%, while K-Nearest Neighbor is 96.25% for K = 1."

2019

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Puja Romulus

Analisis implementasi optical character recognition pada aksara batak dengan menggunakan prinsip K-nearest neighbour = Analysis of optical character recognition implementation for batak characters using K-nearest neighbour / Puja Romulus

"ABSTRAK

Skripsi ini bertujuan untuk mendukung pemeliharaan aset budaya bangsa, terkhusus dalam hal sistem penulisan atau aksara kuno. Implementasinya akan membahas aspek teknologi yaitu pengolahan citra. Pada penelitian kali ini objek yang dikhususkan adalah aksara kuno dari suku Batak. Implementasi dari ide ini akan berbentuk program yang dapat mendeteksi karakter-karakter pada citra dari sebuah dokumen aksara Batak yang bebas dari noise. Program akan memproses citra dari tahapan segmentasi, preprocessing, ekstraksi fitur hingga tahapan klasifikasi. Secara khusus pada ekstraksi fitur dan juga klasifikasi akan ada dua metode yang digunakan yaitu Geometric Moment Invariant dan juga K-Nearest Neighbor. Hasil dari uji coba terdiri dari dua yaitu akurasi atau ketepatan pembacaan, dan juga waktu pemrosesan. Jangkauan hasil pada akurasi berada pada 42% - 96% sementara waktu pemrosesan berada pada 1.9 – 34 detik.

ABSTRACT

This undergraduate thesis is intended to support the preservation of national cultural asset, especially for the ancient characters. The implentation uses technological approach in image processing field. The researched object for this thesis is Batak ancient character. The implementation of the idea will result an application program that will detect the characters in a sample image of a Batak’s document which is still free from any noise. The application program will process the image through several phases. The phases are segmentation, preprocessing, feature extraction, and classification. There is a special method used in each feature extraction and classification. Feature extraction uses Geometric Moment Invariant whereas classification phases uses K-Nearest Neighbour. There will be two results for this test, the first is accuration of the detection and second is the procesing time. The range for the accuration is 42% - 96% and the processing time ranged from 1.9 – 34 seconds."

Fakultas Teknik Universitas Indonesia, 2014

S56323

UI - Skripsi Membership Universitas Indonesia Library

Nurul Shabrina

Metode Bicluster Berbasis k-Nearest Neighbors dan Robust Least Squares Estimation menggunakan Principal Components (bi-KNNRLSP) untuk imputasi Missing values pada Data Ekspresi Gen = Missing values Imputation for Microarray Data Using Bicluster-Based k-Nearest Neighbors and Robust Least Squares Estimation with Principal Components (bi-KNN-RLSP)

"Microarray merupakan salah satu teknologi pada bidang biologi yang memberikan

informasi tentang ekspresi gen. Data microarray mentah berupa gambar, yang harus

diubah menjadi matriks ekspresi gen dimana baris menunjukkan gen, kolom

menunjukkan kondisi eksperimental. Namun, pada praktiknya data microarray banyak

ditemukan missing values yang tentunya akan menghambat proses dari analisis datanya.

Imputasi merupakan salah satu solusi yang dapat mengatasi adanya missing values pada

data microarray. Dengan menggunakan imputasi, nilai missing values yang terdapat pada

matriks data diprediksi atau diestimasi sehingga diperoleh matriks data yang lengkap.

Metode imputasi yang digunakan pada penelitian ini bernama bi-KNN-RLSP, yang

menggunakan konsep biclustering, principal component analysis, dan regresi kuantil.

Dalam proses pembentukan biclustering, dibutuhkan matriks lengkap sementara yang

diperoleh melalui proses praimputasi dengan KNNimpute. Percobaan bi-KNN-RLSP

dilakukan pada data ekspresi gen garis sel kanker serviks dengan menerapkan missing

rate yang berbeda, yaitu 1%, 5%, 10%, 15%, 20%, 25%, dan 30% dengan menggunakan

parameter k=10 pada proses praimputasi KNNimpute. Hasil percobaan tersebut dievaluasi

performanya menggunakan normalized root mean squared error. Nilai rata-rata NRMSE

pada percobaan yang dilakukan sebanyak lima kali memiliki nilai yang lebih rendah

dibandingkan dengan metode bi-RLSP dan row average. Waktu komputasi untuk metode bi-KNN-RLSP dan bi-RLSP tidak jauh berbeda, sehingga dengan waktu yang tidak

signifikan berbeda, metode bi-KNN-RLSP dapat menghasilkan nilai NRMSE yang lebih kecil dibandingkan dengan bi-RLSP. Oleh karena itu, dapat dikatakan bahwa modifikasi praimputasi row average pada metode bi-RLSP menjadi KNNimpute dapat menghasilkan performa imputasi yang lebih bagus. Selain itu, diperoleh hasil bahwa nilai NMRSE untuk metode bi-KNN-RLSP meningkat seiring dengan meningkatnya missing rate.

Microarray is a technology in biology that provides information about gene expression. The raw microarray data is in the form of images, which must be converted into a gene expression matrix where rows indicate genes, columns indicate experimental conditions. However, in practice, many missing values are found in microarray data, which of course
will hinder the process of data analysis. Imputation is one solution that can overcome the missing values in microarray data. By using imputation, the missing values contained in the data matrix are predicted or estimated so that a complete data matrix is obtained. The imputation method used in this study is called bi-KNN-RLSP, which uses the concept of
biclustering, principal component analysis, and quantile regression. In the process of forming biclustering, a temporary complete matrix is needed which is obtained through the pre-imputation process with KNNimpute. The bi-KNN-RLSP experiment was carried out on cervical cancer cell line gene expression data by applying different missing rates,
namely 1%, 5%, 10%, 15%, 20%, 25%, and 30% using the parameter k=10. in the KNNimpute pre-imputation process. The results of these experiments were evaluated for their performance using the normalized root mean squared error. The average value of NRMSE in the five times experiment has a lower value than the bi-RLSP and row average methods. The computation time for the bi-KNN-RLSP and bi-RLSP methods is not much different, so with the time that is not significantly different, the bi-KNN-RLSP method can produce a smaller NRMSE value compared to bi-RLSP. Therefore, it can be said that the modification of the row average preimputation in the bi-RLSP method to KNNimpute can produce better imputation performance. In addition, it was found that the NMRSE value for the bi-KNN-RLSP method increased along with the increase in the missing rate."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian