Hasil Pencarian

Ditemukan 21986 dokumen yang sesuai dengan query

Bouveyron, Charles

Model-based clustering and classification for data science: with applications in R

"Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics."

Cambridge: Cambridge University Press, 2019

e20520634

eBooks Universitas Indonesia Library

Gan, Guojun

Data clustering: theory, algorithms, and applications

"Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results.

The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB programming languages."

Philadelphia: Society for Industrial and Applied Mathematics, 2007

e20448780

eBooks Universitas Indonesia Library

Wu, Junjie

Advances in K-means clustering: a data mining thinking

"This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-means based consensus clustering, identifying the "dangerous" uniform effect and zero-value dilemma of K-means, adapting right measures for cluster validity, and integrating K-means with SVMs for rare class analysis. This book not only enriches the clustering and optimization theories, but also provides good guidance for the practical use of K-means, especially for important tasks such as network intrusion detection and credit fraud prediction. The thesis on which this book is based has won the "2010 National Excellent Doctoral Dissertation Award", the highest honor for not more than 100 PhD theses per year in China."

Berlin: Springer-Verlag, 2012

e204063793

eBooks Universitas Indonesia Library

Survey of text mining II : clustering, classification, and retrieval

London: Springer, 2008

005.741 SUR

Buku Teks SO Universitas Indonesia Library

Nabila Safitri

Analisis Biclustering Iterative Signature Algorithm (ISA) pada Data Kemiskinan di Pulau Sulawesi Tahun 2022 = Analysis of Biclustering Iterative Signature Algorithm on Poverty Data in Sulawesi Island in 2022

"Kemiskinan di Indonesia masih menjadi masalah yang harus diperhatikan setiap tahun. Menurut Laporan Susenas Maret 2022, Pulau Sulawesi menempati urutan ketiga dari enam pulau besar di Indonesia berdasarkan persentase penduduk miskin. Hal ini menunjukkan masih banyak penduduk di Pulau Sulawesi yang mengalami kemiskinan. Oleh karena itu, pemerintah perlu mengambil kebijakan yang tepat untuk mengatasi kemiskinan. Salah satu upaya yang dapat dilakukan pemerintah adalah dengan melakukan pengelompokan, yaitu mengelompokkan daerah-daerah kabupaten/kota di Pulau Sulawesi berdasarkan variabel-variabel kemiskinan. Tujuan penelitian ini adalah mengelompokkan data secara dua arah yaitu pengelompokan berdasarkan kabupaten/kota dan variabel-variabelnya secara bersamaan. Dengan terbentuknya pengelompokan kabupaten/kota dan variabel secara bersamaan akan mempermudah pemerintah untuk membuat kebijakan untuk mengatasi kemiskinan. Metode yang sesuai untuk mengelompokkan kabupaten/kota dan variabel-variabel secara bersamaan adalah metode biclustering. Metode biclustering dapat melakukan pengelompokan observasi dan karakteristik secara bersamaan sehingga terbentuk bicluster yang dapat dicirikan dengan karakteristik yang berbeda. Salah satu algoritma biclustering yaitu Iterative Signature Algorithm (ISA). Pengelompokan dengan menggunakan Iterative Signature Algorithm (ISA) memerlukan nilai ambang batas atas dan nilai ambang batas bawah. Nilai ambang batas adalah nilai yang digunakan untuk menentukan apakah suatu wilayah kabupaten/kota dan variabel-variabel dapat masuk ke dalam bicluster. Hasil yang terbaik dipilih berdasarkan rata-rata Mean Square Residu (MSR) per volume. Analisis biclustering pada data kemiskinan di Pulau Sulawesi tahun 2022 menggunakan Iterative Signature Algorithm (ISA) menghasilkan sebanyak 2 bicluster. Pemerintah diharapkan dapat membuat kebijakan yang tepat sesuai dengan masalah yang terjadi pada bicluster 1 dan bicluster 2.

Poverty in Indonesia is still a problem that must be addressed every year. According to the March 2022 Susenas report, Sulawesi Island ranks at third out of six major islands in Indonesia based on the percentage of the population living in poverty. This shows that there are still many people in Sulawesi Island who experience poverty. Therefore, the government needs to take the right policy to overcome poverty. One of the efforts that the government can make is by clustering, namely grouping districts/cities on the island of Sulawesi based on poverty variables. The objective of this research is to group the data in two directions, namely grouping by district/city and its variables simultaneously. With the formation of groupings of districts/cities and variables simultaneously, it will be easier for the government to make policies to overcome poverty. The appropriate method to group districts/cities and variables together is the biclustering method. The biclustering method able to group observations and characteristics simultaneously so that biclusters formed that can be characterized differently. One of the biclustering algorithms is the Iterative Signature Algorithm (ISA). Clustering using the Iterative Signature Algorithm (ISA) requires an upper threshold value and a lower threshold value. Threshold value is the value used to determine whether a district/city and variables can be included in a bicluster. The best result is selected based on the average Mean Square Residu (MSR) per volume. Biclustering analysis of poverty data in Sulawesi Island in 2022 using Iterative Signature Algorithm (ISA) produce 2 biclusters. Based on this results, the government is expected to make a right policy to overcome poverty problems in bicluster 1 and bicluster 2."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Constrained clustering: Advances in algorithms, theory, and applications

London: CRC Press, 2009

519.53 CON

Buku Teks SO Universitas Indonesia Library

Gabriela Patricia Winny Gracia

Pengkajian metode User-Input-Free Density-Based Clustering (UIFDBC) pada data dengan missing values dan penerapannya pada data real = Assessment of the User-Input-Free Density-Based Clustering (UIFDBC) method on data with missing values and its application to real data

"Clustering merupakan metode untuk mengidentifikasi kelompok natural pada data berdasarkan ukuran kemiripan seperti jarak Eucledian. Clustering bertujuan untuk mengelompokkan data, dengan kriteria observasi yang berada dalam satu klaster memiliki tingkat kemiripan yang sangat signifikan, sedangkan observasi yang berada dalam cluster yang berbeda, memiliki perbedaan yang sangat signifikan. Pada tahun 2021, Chowdhury, Bhattacharyya, & Kalita mengembangkan metode User-Input-Free Density-Based Clustering (UIFDBC) berdasarkan dari metode density-based clustering yang telah ada sebelumnya. Seperti namanya, metode UIFDBC ini tidak memerlukan input dari pengguna untuk menemukan cluster. Maka dari itu, metode UIFDBC ini berhasil menjawab permasalahan metode clustering sebelumnya yang bergantung pada input dari pengguna. Tujuan dari penelitian ini adalah untuk membahas lebih dalam terkait metode User-Input-Free Density-Based Clustering (UIFDBC), menerapkan metode UIFDBC pada data real, yaitu data konsumen kartu kredit untuk melakukan segmentasi konsumen, serta mengkaji performa metode ini pada data yang mengandung missing values di dalamnya. Dari hasil penelitian, metode UIFDBC berhasil diterapkan pada data konsumen kartu kredit, dan diperoleh sebanyak delapan cluster pengguna, dimana setiap cluster memiliki karakteristik masing-masing. Selain itu, dari hasil pengkajian metode UIFDBC terhadap data dengan missing values diketahui bahwa performa metode UIFDBC dinilai cukup baik untuk proporsi missing values ≤ 5%. Namun perlu menjadi catatan bahwa data hasil dari setiap iterasi akan bersifat acak, dikarenakan metode UIFDBC sangat bergantung pada densitas data, sedangkan densitas data bergantung pada missing values yang mana dibangkitkan secara acak sepenuhnya.

Clustering is a method to identify natural groups in data based on similarity measures such as Eucledian distance. Clustering aims to group data, with the criteria for observations in one cluster having a very significant level of similarity, while observations in different clusters have very significant differences. In 2021, Chowdhury, Bhattacharyya, & Kalita developed the User-Input-Free Density-Based Clustering (UIFDBC) method based on the previous density-based clustering method. As the name suggests, this UIFDBC method does not require input from the user to find the cluster. Therefore, this UIFDBC method has succeeded in answering the problems of the previous clustering method which depended on input from the user. The purpose of this study is to discuss more deeply the User-Input-Free Density-Based Clustering (UIFDBC) method, to apply the UIFDBC method to real data, namely credit card consumer data to segment consumers, and to examine the performance of this method on data containing missing values in it. From the results of the study, the UIFDBC method was successfully applied to credit card consumer data, and obtained as many as eight user clusters, where each cluster has its own characteristics. In addition, from the results of the study of the UIFDBC method on data with missing values, it is known that the performance of the UIFDBC method is considered quite good for the proportion of missing values ≤ 5%. However, it should be noted that the resulting data from each iteration will be random, because the UIFDBC method is very dependent on data density, while data density depends on missing values which are generated completely randomly."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Anderberg, Michael R.

Cluster Analysis for Applications/Michael R. Anderberg

New York: Academic Press, 1973

519.53 AND c

Buku Teks SO Universitas Indonesia Library

Foreman, John W.

Data smart : using data science to transform information into insight

Indianapolis: Wiley , 2014

005.72 FOR d;005.72 FOR d (2)

Buku Teks SO Universitas Indonesia Library

Fahira Puti Adylla

Pengelompokan negara-negara berdasarkan indikator-indikator kebahagiaan dunia = Clustering of countries based on the indicators of world happiness

"Kebahagiaan merupakan istilah yang mengacu pada kasih sayang, kesejahteraan, kepuasan, pengalaman kegembiraan, dan kekaguman. Kebahagiaan diukur berdasarkan indikator subjektif dan objektif. Indikator subjektif mengukur pengalaman emosional manusia mengenai peristiwa yang terjadi dalam kehidupannya. Sedangkan indikator objektif mengukur kesejahteraan materi berdasarkan aspek ekonomi, lingkungan sosial, politik, dan kesehatan. Penelitian ini membahas mengenai pengelompokan negara-negara berdasarkan indikator-indikator dari kebahagiaan dunia tahun 2021. Delapan indikator yang digunakan untuk pengelompokan dalam penelitian ini adalah GDP per kapita, dukungan sosial, harapan hidup sehat, kebebasan hidup, persepsi negatif masyarakat terhadap korupsi, kemurahan hati, indeks kriminalitas, dan biaya hidup. Penelitian ini menggunakan metode K-Means dan Fuzzy C-Means untuk mengelompokkan negara-negara. Dari kedua metode akan dicari metode pengelompokan yang paling optimal. Pemetaan hasil kelompok dari metode yang paling optimal dilakukan dengan metode Biplot. Berdasarkan hasil penelitian, didapatkan jumlah kelompok optimal untuk kedua metode adalah sebanyak 3 menggunakan indeks Silhouette untuk metode K-Means dan modifikasi koefisien partisi untuk metode Fuzzy C-Means. Dengan menggunakan nilai rasio simpangan baku dalam dan antar kelompok, didapatkan metode pengelompokan terbaik menggunakan metode K-Means dengan nilai rasio sebesar 0.4413. Kelompok 1 beranggotakan 35 negara yang didominasi oleh negara-negara di wilayah Sub-Saharan Afrika dan Asia Selatan, kelompok 2 beranggotakan 68 negara yang didominasi oleh negara-negara di wilayah Amerika Latin, Persemakmuran Negara-negara Merdeka (PNM), serta Eropa Timur dan Tengah, serta kelompok 3 beranggotakan 30 negara yang didominasi oleh negara-negara di wilayah Eropa Barat, Amerika Utara, dan Australia. Hasil pemetaan ketiga kelompok dengan metode Biplot mampu menerangkan keragaman data sebesar 64.2 persen. Kelompok 1 cenderung memiliki indeks kriminalitas yang tinggi, kemurahan hati yang tinggi, dan persepsi negatif masyarakat terhadap korupsi yang tinggi. Kelompok 2 cenderung memiliki indeks kriminalitas yang tinggi, persepsi negatif masyarakat terhadap korupsi yang tinggi, GDP per kapita yang tinggi, harapan hidup sehat yang tinggi, dan dukungan sosial yang tinggi. Kelompok 3 cenderung memiliki kebebasan hidup yang tinggi, biaya hidup yang tinggi, indeks kebahagiaan yang tinggi, dukungan sosial yang tinggi, harapan hidup sehat yang tinggi, dan GDP per kapita yang tinggi.

Happiness is a term that refers to affection, well-being, contentment, the experience of joy, and admiration. World happiness is measured based on subjective and objective indicators. The Subjective indicators measure human emotional experiences regarding events that occur in their lives. Meanwhile, objective indicators measure happiness based on economic, social, political, and health aspects. This study discusses the clustering of countries based on indicators of world happiness in 2021. In this study, eight indicators used for clustering are GDP per capita, social support, healthy life expectancy, freedom of life, negative perception of corruption, generosity, index crime, and cost of living. This study uses the K-Means and Fuzzy C-Means methods in clustering countries. From these two methods, the optimal clustering method will be sought. Mapping the cluster results was carried out using the Biplot method. Based on the research study, the optimal number of clusters for the both methods is 3 using the Silhouette index for the K-Means method and the partition modification coefficient for the Fuzzy C-Means method. By using the value of the standard deviation ratio within and between clusters, the best clustering method using the K-Means method is obtained with a ratio of 0.44129. The clustering results consisted of cluster 1 with 35 countries dominated by countries in the Sub-Saharan Africa and South Asia region, cluster 2 with 68 countries dominated by countries in Latin America, Commonwealth of Independent States, and Central and Eastern Europe region, and cluster 3 with 30 countries dominated by countries in the Western European, North America, and Australia region. The results of mapping the three groups using the Biplot method were able to explain the diversity of data by 64.2 percent. Cluster 1 tends to have a high crime index, high generosity, and a high negative perception of corruption. Cluster 2 tends to have a high crime index, high perception of corruption, high GDP per capita, high healthy life expectancy, and high social support. Cluster 3 has high freedom of living, high cost of living, high happiness index, high social support, high healthy life expectancy, and high GDP per capita."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian