Search Result

Found 4 Document(s) match with the query

Diyah Septi Andryani

Implementasi hybrid clustering menggunakan algoritma fuzzy c-means dan algoritma divisive untuk menganalisis kekerabatan dna human papillomavirus penyebab kanker serviks = The implementation of hybrid clustering using fuzzy c means algorithm and divisive algorithm for analysing dna human papillomavirus cause of cervical cancer

"Clustering bertujuan untuk mengklasifikasikan pola yang berbeda ke dalam kelompok yang disebut cluster. Analisis gen dengan menggunakan metode clustering dinilai lebih akurat dibandingkan analisis nukleotida menggunakan penyejajaran DNA. Hybrid clustering pada tesis ini mengkombinasikan algoritma fuzzy c-means dan algoritma divisive mampu meningkatkan keakurasian jika dibandingkan pendekatan pengelompokan partitional tradisional. Algoritma divisive akan dijalankan pada step kedua setelah hasil clustering yang diperoleh dari pengelompokan partisi fuzzy c-means.

Penentuan jumlah cluster terbaik ditentukan dari nilai Indeks Davies Bauldin yang paling minimum. Sebanyak 1252 barisan DNA HPV Human papillomavirus diperoleh dari Genbank NCBI dengan proses melakukan ekstraksi ciri DNA, selanjutnya dilakukan normalisasi. Proses ekstraksi ciri, normalisasi, dan penerapan algoritma partisi fuzzy c-means dan divisive dalam metode hybrid clustering menggunakan bantuan program open source.

Pada hasil hybrid clustering level awal diperoleh jumlah cluster optimum sebanyak 3 cluster dengan nilai Indeks Davies Bouldin paling minimum adalah 0.9715919. Pada level ke-2 clustering didapatkan cluster ke-1 terbagi atas 9 sub cluster dengan nilai IDB minimum adalah 0.8909797. Cluster ke-2 terbagi atas 2 sub cluster dengan nilai IDB minimum adalah 0.7650508. Cluster 3 terbagi atas 2 sub cluster dengan nilai IDB minimum adalah 0.9112528. Nilai IDB pada level kedua selalu lebih kecil dibanding nilai IDB pada level 1. Hal ini mengindikasikan bahwa hybrid clustering memberikan hasil yang lebih baik terhadap hasil clustering.

Clustering aims to classify the different patterns into groups called clusters. Analysis gene by using clustering method is considered more accurate than analysis of nucleotide using DNA alignment. In this thesis, hybrid clustering algorithm which combines fuzzy c means and algorithm divisive will be improve accuracy when compared to partitional clustering. Divisive algorithms will applied on second level after clustering partition using fuzzy c means.
To find the best number of clusters is determined using the minimum value of Davies Bouldin Index DBI of the cluster results. The data is 1252 sequences of HPV DNA sequences obtained from Gen Bank Database in the National Centre for Biotechnology Information NCBI at http www.ncbi.nlm.nih.gov in FASTA format. The data is converted into numerical form through feature extraction using n mers frequency.
The results on first level hybrid clustering obtained the optimum cluster divided into three clusters with the value of the minimum Davies Bouldin Index is 0.9715919. Morever, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2017

T47171

UI - Tesis Membership Universitas Indonesia Library

Situmeang, Jason Nimrod Joshua

Clustering Varian Sekuens Protein SARS-CoV-2 Menggunakan Algoritma BIRCH dengan Seleksi Fitur LASSO = Clustering of SARS-CoV-2 Protein Sequence Variants Using BIRCH Algorithm with LASSO Feature Selection

Penelitian ini bertujuan untuk melakukan pengelompokan varian virus SARS-CoV-2 melalui proses clustering menggunakan metode unsupervised learning. Data yang digunakan adalah sekuens protein SARS-CoV-2 yang diekstraksi fiturnya menggunakan paket Discere dalam bahasa pemrograman Python. Sebanyak 27 fitur dihasilkan dan diseleksi dengan metode seleksi fitur Least Absolute Shrinkage and Selection Operator (LASSO). Metode Elbow digunakan untuk menentukan jumlah cluster yang optimal. Dalam penelitian ini, digunakan metode clustering K-Means dan Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH). Evaluasi hasil clustering dilakukan menggunakan metrik evaluasi Silhouette Score dan Davies-Bouldin Index, serta memperhatikan waktu runtime untuk setiap simulasi. Hasil evaluasi kemudian dibandingkan untuk melihat perbedaan performa antara kedua metode clustering yang digunakan, serta pengaruh seleksi fitur terhadap performa clustering. Hasil terbaik diperoleh pada simulasi dengan metode clustering BIRCH + LASSO, dengan nilai Silhouette Score 0,74186 untuk jumlah cluster k=4 dan 0,73207 untuk k=5. Nilai Davies-Bouldin Index terbaik juga diperoleh pada simulasi tersebut, yaitu 0,42697 untuk k=4 dan 0,37949 untuk k=5. Waktu runtime terbaik tercatat pada simulasi dengan metode K-Means + LASSO, yaitu 0,21551 detik untuk k=4 dan 0,17539 detik untuk k=5. Dapat disimpulkan bahwa metode BIRCH menghasilkan cluster yang lebih baik berdasarkan metrik evaluasi, namun K-Means memberikan proses clustering yang lebih cepat. Seleksi fitur dengan metode LASSO juga membantu meningkatkan performa clustering.

This study aims to perform clustering of SARS-CoV-2 virus variants using unsupervised learning methods. The data used consists of SARS-CoV-2 protein sequences whose features are extracted using the Discere package in the Python programming language. A total of 27 features are generated and selected using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection method. The Elbow method is employed to determine the optimal number of clusters for the clustering process. The clustering methods used in this research are K-Means clustering and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH). The clustering results are evaluated using the Silhouette Score and Davies-Bouldin Index metrics, while also considering the runtime for each simulation. The evaluation results are then compared to examine the performance differences between the two clustering methods and the impact of feature selection on clustering performance. The best Silhouette Score is obtained in the simulation using the BIRCH + LASSO clustering method, with a value of 0.74186 for k=4 and 0.73207 for k=5. The best Davies-Bouldin Index is also achieved in the same simulation, with values of 0.42697 for k=4 and 0.37949 for k=5. The fastest runtime is recorded in the simulation using the K-Means + LASSO method, with a time of 0.21551 seconds for k=4 and 0.17539 seconds for k=5. In conclusion, the BIRCH method yields better clustering results based on the evaluation metrics, while K-Means provides faster clustering processes. The LASSO feature selection method also aids in improving clustering performance.
"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Evan Haryowidyatna

Analisis Pengelompokan Kabupaten dan Kota di Pulau Jawa Sebagai Sasaran Industri Sepeda Motor dengan Metode Partitional Hard Clustering = Clustering Analysis of Districts and Cities in The Island of Java as Targets of Motorcycle Industry Using Partitional Hard Clustering Method

"Per 9 Februari 2023, 87% dari total populasi kendaraan pribadi di Indonesia merupakan sepeda motor. Persebaran sepeda motor terpadat di Indonesia berada di Pulau Jawa dengan persentase sebesar 60%. Tingginya populasi sepeda motor dan fakta bahwa 80% rumah tangga di Pulau Jawa sudah memiliki sepeda motor membuat pasar sepeda motor semakin mengecil. Dalam jangka panjang, kondisi ini dapat berdampak buruk bagi industri sepeda motor yang terus ingin berkembang. Penelitian ini membahas tentang pengelompokan kabupaten dan kota di Pulau Jawa berdasarkan karakteristik demografinya. Kemudian, diberikan saran keputusan yang dapat dilakukan oleh industri sepeda motor berdasarkan kelompok kabupaten dan kota yang terbentuk menggunakan teknik clustering. Hal ini bertujuan agar produsen yang bergerak di industri sepeda motor dapat memfokuskan produknya pada kelompok kabupaten dan kota yang memiliki potensi terbaik. Terdapat 12 variabel demografi yang digunakan dalam penelitian ini, dan variabel tersebut terbagi menjadi tiga kategori: kondisi ekonomi masyarakat, kondisi kehidupan masyarakat, dan kondisi demografis daerah. Metode yang digunakan dalam penelitian ini adalah metode partitional hard clustering. Sebelumnya, dilakukan pembuatan dataset melalui proses data scrapping pada situs terpercaya, dan dilanjutkan dengan proses Exploratory Data Analysis (EDA) pada dataset. Setelah dataset terbentuk, dilakukan pengelompokan dengan metode partitional hard clustering yang terdiri dari metode K-Means Clustering dan metode K-Medoids Clustering. Kemudian, dilakukan evaluasi cluster untuk menentukan metode clustering yang paling sesuai dengan menggunakan empat metrik evaluasi yaitu Indeks Silhouette, Indeks Dunn, Indeks Davies Bouldin, dan Indeks Calinski Harabasz. Didapatkan hasil bahwa metode K-Medoids Clustering dengan 5 kelompok merupakan yang terbaik untuk mengelompokkan kabupaten dan kota di Pulau Jawa. Setelah kelompok terbentuk, setiap kelompok diberikan rekomendasi keputusan yang sebaiknya diambil oleh industri sepeda motor. Terdapat 4 rekomendasi yang dapat diberikan, yaitu distribusi suku cadang, pembuatan bengkel, penjualan sepeda motor kelas menengah ke atas, dan penjualan sepeda motor kelas menengah ke bawah.

As of February 9, 2023, 87% of the total population of private vehicles in Indonesia consists of motorcycles. The densest distribution of motorcycles in Indonesia is found on the Island of Java, with a percentage of 60%. The high population of motorcycles and the fact that 80% of households in Java already have motorcycles are causing the motorcycle market to shrink. In the long run, this condition can have negative impacts on the motorcycle industry that continues to seek growth. This research focuses on the clustering of regencies and cities in Java based on their demographic characteristics. Subsequently, decision recommendations will be provided for the motorcycle industry based on the formed groups using clustering techniques. The aim is to enable manufacturers in the motorcycle industry to focus their products on regencies and cities with the best potential. There are 12 demographic variables used in this research, divided into three categories: the economic conditions of society, the living conditions of society, and the demographic conditions of the region. The method used in this research is the partitional hard clustering method. Firstly, a dataset is created through the data scraping process on trusted sites, followed by the Exploratory Data Analysis (EDA) process on the dataset. Once the dataset is formed, clustering is performed using the partitional hard clustering method, consisting of the K-Means Clustering and K-Medoids Clustering methods. Subsequently, cluster evaluation is carried out to determine the most suitable clustering method using four evaluation metrics: Silhouette Index, Dunn Index, Davies Bouldin Index, and Calinski Harabasz Index. The results show that the K-Medoids Clustering method with 5 clusters is the best for grouping regencies and cities in Java. After the groups are formed, each group is given decision recommendations that the motorcycle industry should consider. There are four recommendations: spare parts distribution, workshop establishment, sales of mid- to high-end motorcycles, and sales of mid-range motorcycles and below."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Hans Carlo Adrianto

Klasterisasi Provinsi Penghasil Kelapa Sawit di Indonesia dengan Metode Hierarchical Clustering = Clustering of Palm Oil Producing Provinces in Indonesia Using the Hierarchical Clustering Method

"Indonesia merupakan produsen utama minyak kelapa sawit di dunia sekaligus penyumbang devisa terbesar dari sektor perkebunan. Namun demikian, produktivitas kelapa sawit nasional menunjukkan tren penurunan, meskipun luas areal tanam terus meningkat setiap tahunnya. Penurunan ini disebabkan oleh berbagai faktor, seperti tingginya proporsi pohon tua yang belum diremajakan, rendahnya efektivitas program replanting terutama di perkebunan rakyat, terbatasnya akses petani terhadap benih unggul dan teknologi modern, serta masih maraknya peredaran benih palsu. Penelitian ini bertujuan untuk mengelompokkan provinsi-provinsi penghasil kelapa sawit berdasarkan karakteristik produktivitas dan luas tanaman menghasilkan (TM), guna memberikan dasar bagi rekomendasi kebijakan yang lebih tepat sasaran. Metode yang digunakan adalah Hierarchical Clustering dengan jumlah klaster (k) = 2, yang dipilih karena memberikan struktur klaster yang terpisah dengan baik dan mudah divisualisasikan. Metode ini dibandingkan dengan K-Means Clustering dan divalidasi melalui pengukuran validitas internal (Silhouette Score = 0,298; Calinski-Harabasz Index = 9,105; Davies-Bouldin Index = 1,477; dan Dunn Index = 0,457) serta validitas eksternal (Adjusted Rand Index = 0,419; F-value = 60,64; dan p-value = 5,07e-08). Hasil validasi menunjukkan klaster yang terbentuk cukup representatif dan stabil. Hasil klasterisasi menghasilkan dua kelompok utama, yaitu Klaster 1 berlabel “Kebun Kurang Produktif dan Tidak Stabil” dan Klaster 2 berlabel “Kebun Produktif dan Efisien”. Klaster 1 terdiri dari provinsi-provinsi yang didominasi oleh perkebunan rakyat dengan produktivitas rendah, fluktuasi hasil panen tinggi, dan tren pertumbuhan negatif. Klaster ini juga menunjukkan keterbatasan luas tanaman menghasilkan serta nilai UMR yang lebih rendah dari rata-rata nasional. Sebaliknya, Klaster 2 terdiri dari wilayah dengan produktivitas tinggi dan stabil, didukung oleh pengelolaan kebun yang efisien dan akses terhadap teknologi yang memadai. Berdasarkan hasil tersebut, penelitian ini merekomendasikan intervensi kebijakan khusus untuk Klaster 1, di antaranya percepatan program replanting, penguatan akses terhadap benih unggul dan teknologi pertanian, serta peningkatan kapasitas manajemen kebun dan sumber daya manusia (SDM). Penelitian ini diharapkan dapat memberikan kontribusi terhadap strategi pengembangan sektor kelapa sawit nasional melalui pendekatan kebijakan berbasis data.

Indonesia is the world's leading producer of palm oil and the largest contributor to foreign exchange earnings from the plantation sector. However, national palm oil productivity is showing a downward trend, even though the area under cultivation continues to increase every year. This decline is attributed to various factors, including a high proportion of old trees that have not been replanted, low effectiveness of replanting programs, particularly in smallholder plantations, limited access for farmers to high-quality seeds and modern technology, and the widespread circulation of counterfeit seeds. This study aims to group palm oil-producing provinces based on productivity characteristics and productive area (PA) to provide a basis for more targeted policy recommendations. The method used is Hierarchical Clustering with the number of clusters (k) = 2, chosen because it provides a well-separated cluster structure that is easy to visualize. This method was compared with K-Means Clustering and validated through internal validity measurements (Silhouette Score = 0.298; Calinski-Harabasz Index = 9.105; Davies-Bouldin Index = 1.477; and Dunn Index = 0.457) as well as external validity (Adjusted Rand Index = 0.419; F-value = 60.64; and p-value = 5.07e-08). The validation results indicate that the clusters formed are sufficiently representative and stable. The clustering results yield two main groups: Cluster 1 labeled “Low-Productive and Unstable Plantations” and Cluster 2 labeled “Productive and Efficient Plantations.” Cluster 1 consists of provinces dominated by smallholder plantations with low productivity, high harvest yield fluctuations, and negative growth trends. This cluster also shows limitations in the area of productive crops and a lower UMR value than the national average. Conversely, Cluster 2 consists of regions with high and stable productivity, supported by efficient plantation management and adequate access to technology. Based on these findings, this study recommends specific policy interventions for Cluster 1, including accelerating replanting programs, strengthening access to high-quality seeds and agricultural technology, and enhancing plantation management capacity and human resources (HR). This study is expected to contribute to national palm oil sector development strategies through a data-driven policy approach."

Depok: Fakultas Teknik Universitas Indonesia, 2025

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Search Result :: Save as CSV :: Back

Search Result