Hasil Pencarian

Ditemukan 168169 dokumen yang sesuai dengan query

Bagaskara Ghanyvian Istiqlal

Analisa dan seleksi fitur model k-nearest neighbor untuk mengklasifikasi kualitas tidur berdasarkan dataset pmdata = Features selection analysis from k-nearest neighbor model for sleep quality classification based on pmdata dataset.

"Kualitas tidur yang baik sangatlah penting untuk berbagai aspek kehidupan seperti kesehatan fisik, kesehatan mental, keselamatan, konsentrasi, performa, penyembuhan, dan lain-lain. Kualitas tidur tidak hanya mencakup aspek fisiologis, tetapi juga memperhatikan aspek mental seperti: kondisi setelah tidur, kepuasan dengan tidur, dan pengaruh pada kehidupan sehari-hari. Penelitian ini mengusulkan penggabungan data objektif yang berasal dari Fitbit dan kuesioner subjektif untuk mengklasifikasi kualitas tidur menggunakan K-Nearest Neighbor. Klasifikasi ini bertujuan untuk mempelajari fitur-fitur yang paling pengaruh dalam kualitas tidur. Data objektif yang berisikan data fisiologis dan aspek tidur terukur oleh Fitbit, serta data subjektif mengenai aspek mental, keduanya dijadikan fitur deskriptif dalam model. Analisa fitur yang paling berpengaruh dilakukan dari dua sudut pandang model, yaitu fitur target kualitas tidur subjektif dan fitur target kualitas objektif. Kedua model dilatih dengan serangkaian data preprocessing yang termasuk didalamnya terdapat seleksi fitur dan ekstraksi fitur. Seleksi fitur berbasis ANOVA F Test akan dibandingkan dengan ekstraksi fitur Principal Component Analysis (PCA) dan Neighborhood Component Analysis(NCA). Seleksi fitur ANOVA F-Test lebih baik dari PCA dan NCA dengan peningkatan skor sebesar 0,06-0,08 pada model objektif, dan 0,01-0,06 pada model subjektif. Skor terbaik terbaik dari model subjektif yaitu 0,52 dengan parameter jumlah fitur = 3 dan k-neighbors = 27. Skor terbaik terbaik dari model objektif yaitu 0,72 dengan parameter jumlah fitur = 7 dan k-neighbors = 4. Pada akhirnya, ditemukan 3 Fitur yang paling berpengaruh dalam klasifikasi subjektf, dan 7 fitur yang paling berpengaruh dalam klasifikasi objektif.

Good quality sleep is very important for various aspects of life such as physical health, mental health, safety, concentration, performance, healing, and others. Sleep quality does not only include physiological aspects, but also pay attention to mental aspects such as condition after sleep, satisfaction with sleep, and influence on daily life. This study proposes combining objective data from Fitbit and subjective questionnaires to classify sleep quality using K-Nearest Neighbor. This classification aims to study the features that have the most influence in sleep quality. Objective data containing physiological data and sleep aspects measured by Fitbit, as well as subjective data on mental aspects, are both used as descriptive features in the model. The analysis of the most influential features is carried out from two viewpoints of the model, namely the subjective sleep quality target feature and the objective quality target feature. Both models are trained with a series of preprocessing data which includes feature selection and feature extraction. ANOVA F Test based on feature selection will be compared with feature extraction of Principal Component Analysis (PCA) and Neighborhood Component Analysis (NCA). ANOVA F-Test feature selection is better than PCA and NCA with an increase in scores of 0.06-0.08 in the objective model, and 0.01-0.06 in the subjective model. The best score of the subjective model is 0.52 with the parameter number of features = 3 and k-neighbors = 27. The best score of the objective model is 0.72 with the parameter number of features = 7 and k-neighbors = 4. In the end, it was found 3 the most influential features in the subjective classification, and 7 the most influential features in the objective classification."

Depok: Fakultas Teknik Universitas Indonesia, 2020

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Nabila Dita Putri

Pembangunan Data dan Model Analisis Emosi Fine-Grained pada Teks Media Sosial Berbahasa Indonesia = Fine-Grained Emotion Analysis on Indonesian Social Media Text: Dataset and Models

"Saat ini, dataset yang tersedia untuk melakukan analisis emosi di Indonesia masih terbatas, baik dari segi jumlah data, cakupan emosi, serta sumbernya. Pada penelitian ini, peneliti membangun dataset besar untuk tugas analisis emosi pada data teks berbahasa Indonesia, di mana dataset ini dikumpulkan dari berbagai domain dan sumber. Dataset ini mengandung 33 ribu teks, yang terdiri dari tweet yang dikumpulkan dari Twitter, serta komentar unggahan yang dikumpulkan dari Instagram dan Youtube. Domain yang dicakup pada dataset ini adalah domain olahraga, hiburan, dan life chapter. Dataset ini dianotasi oleh 36 annotator dengan label emosi fine-grained secara multi-label, di mana label emosi yang digunakan ini merupakan hasil dari taksonomi emosi baru yang diusulkan oleh peneliti. Pada penelitian ini, peneliti mengusulkan taksonomi emosi baru yang terdiri dari 44 fine-grained emotion, yang dikelompokkan ke dalam 6 basic emotion. Selain itu, peneliti juga membangun baseline model untuk melakukan analisis emosi. Didapatkan dua baseline model, yaitu hasil fine-tuning IndoBERT dengan f1-score micro tertinggi sebesar 0.3786, dan model hierarchical logistic regression dengan exact match ratio tertinggi sebesar 0.2904. Kedua baseline model tersebut juga dievaluasi di lintas domain untuk dilihat seberapa general dan robust model yang telah dibangun.

Currently, no research in Indonesia utilises fine-grained emotion for emotion analysis. In addition, the available datasets for analysing emotions still need to be improved in terms of the amount of data, the range of emotions, and their sources. In this study, researchers built a large dataset for analysing emotion. This dataset contains 33k texts, consisting of tweets collected from Twitter and comments collected from Instagram and Youtube posts. The domains covered in this dataset are sports, entertainment, and life chapter. Thirty-six annotators annotated this dataset with fine-grained emotion labels and a multi-label scheme, where the emotion labels resulted from a new emotion taxonomy proposed by the researcher. In this study, the researchers propose a new emotion taxonomy consisting of 44 fine-grained emotions which are grouped into six basic emotions. Two baseline models were obtained, the first one is the fine-tuned IndoBERT model, which achieved the highest f1-score micro of 0.3786, and the second one is hierarchical logistic regression model, which achieved the highest exact match ratio of 0.2904. Both baseline models were also evaluated to determine their cross-domain applicability. The dataset and baseline models that are produced in this study are expected to be valuable resources for future research purposes."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Kaysa Syifa Wijdan Amin

Pembangunan Data dan Model Analisis Emosi Fine-Grained pada Teks Media Sosial Berbahasa Indonesia = Fine-Grained Emotion Analysis on Indonesian Social Media Text: Dataset and Models

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Gilang Catur Yudishtira

Pembangunan Data dan Model Analisis Emosi Fine-Grained pada Teks Media Sosial Berbahasa Indonesia = Fine-Grained Emotion Analysis on Indonesian Social Media Text: Dataset and Models

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Shinta Nataya Paramesti

Studi analisis eigenface dan eigen fuzzy set untuk ekstraksi ciri bibir pada sistem identifikasi wajah

"Identifikasi wajah berdasarkan ciri bibir berpengaruh pada keberhasilan pencarian citra wajah orang dikarenakan adanya variasi bentuk bibir yang dapat menjadi pembeda tiap individu. Untuk mempercepat pencarian pelaku kriminal, sebuah sistem aplikasi identifikasi wajah berdasarkan ciri bibir menjadi suatu kebutuhan. Sistem tersebut harus dapat mengekstrak ciri bibir dari sebuah citra digital menggunakan metode ekstraksi ciri yang akurat dan cepat.

Penelitian ini melakukan studi analisis kinerja metode eigenface dengan eigen fuzzy set (himpunan fuzzy eigen) untuk ekstraksi ciri bibir dalam sistem identifikasi wajah. Eigenface adalah metode ekstraksi ciri yang telah terbukti keberhasilannya dalam mengekstrak ciri wajah, sedangkan metode eigen fuzzy set dikembangkan berdasarkan teori himpunan fuzzy dan dapat digunakan untuk analisa citra. Metode deteksi bibir otomatis berdasarkan ciri warna juga dievaluasi efektifitasnya untuk perolehan citra dalam penelitian ini. Analisis dilakukan dengan metode analisis statistik desktiptif dan statistik inferensi. Uji coba dilakukan untuk dua skenario yang dibedakan berdasarkan citra bibir hasil segmentasi manual dan otomatis.

Hasil uji coba menunjukkan bahwa hasil deteksi otomatis hanya efektif mendeteksi bibir sebanyak 61.4% dan precision-recall perolehan wajah pada skenario 2 lebih rendah dari skenario 1. Metode eigen fuzzy set memiliki waktu komputasi lebih rendah dibandingkan metode eigenface. Sedangkan nilai precision-recall tertinggi dihasilkan oleh metode eigenface dengan rata-rata nilai 0.22%. Dari hasil ini disimpulkan bahwa metode ekstraksi ciri eigenface lebih efektif dibandingkan eigen fuzzy set. Sistem identifikasi wajah dengan metode eigenface untuk ekstraksi ciri kedepannya dapat dikembangkan menjadi sistem identifikasi wajah berbasis komponen wajah."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2007

T-Pdf

UI - Tesis Membership Universitas Indonesia Library

Hendrico Kristiawan

Klasifikasi Domain Spesialisasi Dokter pada Data Teks Forum Tanya Jawab Kesehatan = Classification of Doctor Specialization Domain in Health Question and Answer Forum Text Data

"Pertanyaan konsultasi pada sebuah forum daring perlu dijawab oleh dokter spesialis yang tepat agar jawaban yang diberikan akurat dan bermanfaat bagi pengguna yang bertanya. Terkait hal tersebut, penelitian ini membahas tentang pengembangan model yang dapat secara otomatis mengarahkan sebuah pertanyaan konsultasi kesehatan ke dokter dengan spesialisasi yang sesuai. Lebih jauh lagi, model yang dibangun merupakan model klasifikasi multi-label karena sebuah pertanyaan dapat terasosiasi dengan lebih dari satu spesialisasi. Penelitian ini dimulai dengan mengevaluasi keefektifan metode pemetaan berbasis aturan dalam memprediksi data yang dianotasi oleh pakar, dan diperoleh hasil yang menunjukkan tingkat keberhasilan yang cukup. Selanjutnya, dikembangkan sebuah model machine learning yang melakukan klasifikasi domain spesialis dokter. Pelatihan model dilakukan dengan berbagai metode, termasuk supervised, unsupervised, serta semi-supervised learning. Model terbaik ditemukan melalui metode domain adaptive pre-training dengan IndoBERT-large sebagai model acuan dan melibatkan unsupervised learning. Selain itu, model supervised learning juga digunakan dengan menggunakan model konvensional, dan hasilnya digunakan untuk analisis kontribusi dari fitur-fitur yang digunakan dalam klasifikasi. Terakhir, penelitian ini mengevaluasi kembali anotasi yang dilakukan oleh manusia dengan menggunakan kata kunci sebagai pendekatan untuk mengurangi kesalahan dalam dataset. Dengan pendekatan ini, berhasil ditemukan beberapa kesalahan anotasi pada dataset yang dianotasi oleh manusia.

The consultation questions on an online forum need to be answered by the appropriate specialist doctors to provide accurate and beneficial answers to the users asking the questions. In relation to this, this study discusses the development of a model that can automatically direct a health consultation question to a doctor with the corresponding specialization. Furthermore, the constructed model is a multi-label classification model because a question can be associated with more than one specialization. There are several issues addressed in this work. This research begins by evaluating the effectiveness of rule-based mapping methods in predicting data annotated by experts, and the results show a satisfactory level of success. Furthermore, a multi-label classification model is developed to classify the specialist domains of doctors. The model training is performed using various methods, including supervised learning, unsupervised learning, and semi-supervised learning. The best model is found through domain adaptive pre-training using IndoBERT-large as the reference model and involving unsupervised learning. Additionally, the supervised learning model is also used with a conventional model, and the results are used to analyze the contribution of the features used in the classification. Lastly, this research re-evaluates the annotations made by humans using keyword-based approaches to reduce errors in the dataset. With this approach, several annotation errors were successfully identified in the dataset annotated by humans."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Liou, James J.H

Improving airline service quality based on rough set theory and flow graphs/ Liou, James J.H ; Chuang, Yen-Ching ; Hsu, Chao-Che

"This study differs from previous studies by applying multivariate statistical analysis and multi-criterion decision-making methods to the improvement of service quality. We use the rough set theory (RST) with a flow graph approach to determine customer attitudes regarding service quality, which can assist managers in developing strategies to improve service quality and thus satisfy the needs of customers. A set of rules is derived from a large sample of airline customers, and its predictive ability is evaluated. The flow graph and the cause-and-effect relationship of the decision rules are heavily exploited in service quality characteristics. As compared with the results of other data-mining analyses, our results are encouraging. This study demonstrates that the combination of the RST model and flow graphs assists in identifying the needs of customers, determining their characteristics, and facilitating the development of an improvement strategy.

Taylor and Francis, 2016

658 JIPE 33:2 (2016)

Artikel Jurnal Universitas Indonesia Library

Mufarrido Husnah

Klasifikasi sekuens protein coronavirus menggunakan Metode K-Nearest Neighbor dan seleksi fitur algoritma genetika = Classification of coronavirus protein sequences using K-Nearest Neighbor method and feature selection genetic algorithm

"Coronavirus (CoV) adalah keluarga virus penyebab penyakit sistem pernapasan ringan hingga berat pada berbagai spesies hewan termasuk manusia. Salah satu spesies Coronavirus yang muncul pada akhir tahun 2019 yaitu SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) dan menimbulkan penyakit baru bernama Covid-19 (Coronavirus disease-2019) kemudian berstatus pandemi. Penyebaran Covid-19 yang cepat dan dengan tingkat kematian yang tinggi terus terjadi di berbagai negara. Oleh karena itu, deteksi dini patogen perlu dilakukan secara cepat dengan menggunakan data sekuens protein Coronavirus. Sekuens protein merupakan data struktur primer dari suatu protein yang memiliki 27 fitur berdasarkan discere. Dalam penerapannya, tidak semua fitur relevan dengan data yang digunakan sehingga perlu seleksi fitur untuk menghindari dimensi data yang tinggi dan tidak optimal. Seleksi fitur algoritma genetika memberikan fitur-fitur optimal pada data dan metode K-Nearest Neighbor (KNN) melakukan klasifikasi data sekuens protein Coronavirus dengan fitur hasil seleksi fitur algoritma genetika. Seleksi fitur algoritma genetika menghasilkan 11 fitur optimal yang meningkatkan performa running time metode klasifikasi KNN menjadi 0,0541 detik. Fitur optimal diperoleh dari karakteristik AA-count , secondary structure fraction , isoelectric point dan instability index. Hasil terbaik performa akurasi, spesifisitas beserta sensitifitas secara berurutan yaitu 96,68%, 98,7% dan 94,4% yang diperoleh pada nilai parameter K=3.

Coronaviruses (CoV) are a family of viruses that cause mild to severe respiratory system diseases in various animal species including humans. One of the Coronavirus species that emerged at the end of 2019 was SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) and caused a new disease called Covid-19 (Coronavirus disease-2019) then had a pandemic status. The rapid spread of Covid-19 and with a high death rate continues to occur in most of countries. Therefore, early detection of pathogens needs to be done quickly using Coronavirus protein sequence data. Protein sequences are primary structural data of a protein that has 27 features but not all of the existing features are relevant to the data used, so feature selection is necessary to avoid high and suboptimal data dimensions. The genetic algorithm feature selection provides optimal features to the data and the K-Nearest Neighbor (KNN) method performs the classification of Coronavirus protein sequences data with features resulting from the genetic algorithm feature selection. The genetic algorithm feature selection produces 11 optimal features that improve the running time performance of the KNN classification method. The average result of running time is 0.0541 second. The best results were accuracy performance, specificity and sensitivity are 96.68%, 98.7% and 94.4% respectively which were obtained at the parameter value K=3."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Fenni Amalia

Metode Biclustering Terurut Berbasis k-Nearest Neighbour, Mean Square Residual, dan Jarak Euclidean dalam Imputasi Missing Values pada Data Ekspresi Gen = Sequential Biclustering Method Based on k-Nearest Neighbor, Mean Squared Residual, and Euclidean Distance in Missing Values Imputation on Gene Expression Data

"Bioinformatika merupakan ilmu yang ditujukan untuk menganalisis informasi biologis. Dalam perkembangan penelitian bioinformatika, data diperoleh salah satunya dengan menggunakan teknologi microarray. Teknologi microarray digunakan oleh lingkup biologi molekuler dalam melihat perbedaan tingkat ekspresi gen dengan cara mengonversi gambar monokromik yang berisi ratusan bahkan ribuan gen dari sampel sel dan menghasilkan data ekspresi gen. Teknologi microarray sering kali menghasilkan data ekspresi gen yang hilang atau tidak terdeteksi akibat adanya kesalahan teknis. Oleh karena itu, diperlukannya suatu metode imputasi pada data untuk mengatasi missing values. Pada penelitian ini, akan dikembangkan suatu metode imputasi yang disebut Biclustering Terurut berbasis k-Nearest Neighbor, Mean Squared Residual, dan Jarak Euclidean. Metode ini merupakan metode imputasi berbasis biclustering dimana bicluster dibentuk berdasarkan suatu kriteria yang melibatkan skor Mean Squared Residue dan Jarak Euclidean. Penggunakan k-Nearest Neighbor sebagai metode pra-imputasi didasarkan pada data ekspresi gen yang sering kali memiliki pola kompleks dan sulit terdeteksi, sehingga perlu pendekatan yang dapat memetakan struktur korelasi pada data. k-Nearest Neighbor mempertimbangkan korelasi pada data microarray dengan menyeleksi kumpulan gen yang memiliki profil ekspresi mirip dengan gen yang ingin diimputasi (gen target). Pada penelitian ini, metode SBi-kNN-MSREimpute diterapkan pada data ekspresi gen pasien penderita COVID-19 yang dilakukan tes rapid harian. Evaluasi kinerja metode SBi-kNN-MSREimpute dilakukan dengan menggunakan NRMSE, dimana hasilnya dibandingkan dengan metode SBi-MSREimpute. Berdasarkan penelitian yang dilakukan, metode SBi-kNN-MSREimpute dinilai lebih baik dibandingkan dengan SBi-MSREimpute untuk setiap missing rate pada tingkatan c berbeda. Nilai c optimal untuk imputasi missing values pada data COVID-19 adalah c = 10% untuk missing rate 25%, 30%, 40% dan c = 15% untuk missing rate 5%, 10%, 15%, 20%, dan 50%. Hasil akhir juga menunjukkan bahwa nilai NRMSE untuk SBi-kNN-MSREimpute relatif stabil bahkan untuk data dengan missing rate tinggi hingga 50%.

Bioinformatics is a study designed to analyze biological information. In the development of bioinformatics research, data was obtained using microarray technology. Microarray technology is used by the scope of molecular biology in transposing hundreds and even thousands of genes from cellular samples simultaneously and producing a gene expression data. Microarray technology often produces data that is lost or undetected as a result of technical error. Therefore, an imputation method is needed to address the missing values. In this study, a new imputation method called Sequential Biclustering based k-Nearest Neighbor, Mean Squared Residual, and Euclidean Distance (SBi-kNN-MSRE) will be developed. This method is a biclustering-based imputation method where the bicluster is formed based on a criterion involving Mean Squared Residue and Euclidean Distance. The use of k-Nearest Neighbor as a pre-imputation method is based on data on gene expression that often has a complex and difficult pattern of detection, so it requires an approach that can map correlation structures on data. K-nearest neighbor considers a correlation on a microarray data by selecting groups of genes that have an expression profile similar to a gene that wants to be imputed (the target gene). In this study, the SBi-kNN-MSRE method was applied to the data on the genes of patients with covid-19 that daily rapid tests were performed. The performance evaluation of the SBi-kNN-MSRE method is done using NRMSE, where the results are compared to the SBi-MSRE method. According to the result, the SBi-kNN-MSRE method performed better than SBi-kNN-MSRE for each missing rate on different c levels. The optimal c value on the covid-19 data is c = 10% for missing rate 25%, 30%, 40% and c = 15% for missing rate 5%, 10%, 15%, 20% and 50%. The results also showed that NRMSE scores"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Romli Noor Ahmad

Suatu tinjauan tentang subset fuzzy dan aplikasinya

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 1985

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian