[ABSTRAK Berdasarkan data WHO tahun 2014, diperkirakan sekitar 15 juta orang di duniayang terinfeksi hepatitis B (HBsAg+) juga terinfeksi hepatitis D. Infeksi hepatitisD dapat terjadi bersamaan (koinfeksi) atau setelah seseorang terkena hepatitis Bkronis (superinfeksi). Penyakit hepatitis B disebabkan oleh virus HBV danpenyakit hepatitis D disebabkan oleh virus HDV. HDV tidak dapat hidup tanpaHBV. Hepatitis D erat hubungannya dengan infeksi virus HBV, sehingga sangatrealistis bila setiap usaha pencegahan terhadap hepatitis B, maka secara tidaklangsung mencegah hepatitis D. Pada tesis ini akan dibahas bagaimana hasilpengelompokan barisan DNA HBV menggunakan algoritma k-means clusteringdengan menggunakan perangkat lunak R. Dimulai dengan mengumpulkan barisanDNA HBV yang diambil dari GenBank, kemudian dilakukan ekstraksi cirimenggunakan n-mers frequency, dan hasil ekstraksi ciri barisan DNA tersebutdikumpulkan dalam sebuah matriks dan dilakukan normalisasi menggunakannormalisasi min-max dengan interval [0, 1] yang akan digunakan sebagai datamasukan. Jumlah cluster yang dipilih dalam penelitian ini adalah dua danpenentuan centroid awal dilakukan secara acak. Pada setiap iterasi dihitung jarakmasing-masing objek ke masing-masing centroid dengan menggunakan Euclideandistance dan dipilih jarak terpendek untuk menentukan keanggotaan objek disuatu cluster sampai akhirnya terbentuk dua cluster yang konvergen. Hasil yangdiperoleh adalah virus HBV yang berada pada cluster pertama lebih ganasdibanding virus HBV yang berada pada cluster kedua, sehingga virus HBV padacluster pertama berpotensi berevolusi dengan virus HDV menjadi penyebabpenyakit hepatitis D. ABSTRACT Based on WHO data, an estimated of 15 millions people worldwide who areinfected by hepatitis B (HBsAg+) are also infected by hepatitis D. Hepatitis Dinfection can occur simultaneously with hepatitis B (co infection) or after a personis exposed to chronic hepatitis B (super infection). Hepatitis B is caused by theHBV virus and hepatitis D is caused by HDV virus. HDV can not live withoutHBV. Hepatitis D virus is closely related to HBV infection, hence it is reallyrealistic that every effort of prevention against hepatitis B can indirectly preventhepatitis D. This thesis discussed the clustering of HBV DNA sequences by usingk-means clustering algorithm and R programming. Clustering processes is startedwith collecting HBV DNA sequences that are taken from GenBank, thenperforming extraction HBV DNA sequences using n-mers frequency andfurthermore the extraction results are collected as a matrix and normalized usingthe min-max normalization with interval [0, 1] which will later be used as an inputdata. The number of clusters is two and the initial centroid selected of cluster ischoosed randomly. In each iteration, the distance of every object to each centroidare calculated using the Euclidean distance and the minimum distance are selectedto determine the membership in a cluster until two convergent clusters are created.As the result, the HBV viruses in the first cluster is more virulent than the HBVviruses in the second cluster, so the HBV viruses in the first cluster can potentiallyevolve with HDV viruses that cause hepatitis D., Based on WHO data, an estimated of 15 millions people worldwide who areinfected by hepatitis B (HBsAg+) are also infected by hepatitis D. Hepatitis Dinfection can occur simultaneously with hepatitis B (co infection) or after a personis exposed to chronic hepatitis B (super infection). Hepatitis B is caused by theHBV virus and hepatitis D is caused by HDV virus. HDV can not live withoutHBV. Hepatitis D virus is closely related to HBV infection, hence it is reallyrealistic that every effort of prevention against hepatitis B can indirectly preventhepatitis D. This thesis discussed the clustering of HBV DNA sequences by usingk-means clustering algorithm and R programming. Clustering processes is startedwith collecting HBV DNA sequences that are taken from GenBank, thenperforming extraction HBV DNA sequences using n-mers frequency andfurthermore the extraction results are collected as a matrix and normalized usingthe min-max normalization with interval [0, 1] which will later be used as an inputdata. The number of clusters is two and the initial centroid selected of cluster ischoosed randomly. In each iteration, the distance of every object to each centroidare calculated using the Euclidean distance and the minimum distance are selectedto determine the membership in a cluster until two convergent clusters are created.As the result, the HBV viruses in the first cluster is more virulent than the HBVviruses in the second cluster, so the HBV viruses in the first cluster can potentiallyevolve with HDV viruses that cause hepatitis D.] |