Search Result

Found 6 Document(s) match with the query

Rosalia Deviana Cahyaningrum

Implementasi metode spectral clustering-partitioning around medoids (PAM) dengan algoritma similaritas paralel berbasis cuda pada data microarray gen karsinoma = Implementation of spectral clustering partitioning around medoids (PAM) method with parallel similarity algorithm based on cuda in microarray data of carcinoma genes

"Penelitian ini bertujuan untuk mengimplementasikan spectral clustering-PAM dengan menggunakan algoritma similaritas serial dan mengimplementasikan algoritma similaritas paralel berbasis CUDA dalam metode spectral clustering pada data microarray gen karsinoma. Implementasi dibantu dengan perangkat lunak R berbasis open source yang digunakan pada algoritma spectral clustering-PAM dengan algoritma similaritas serial dan CUDA yang digunakan pada algoritma similaritas paralel. Pengelompokan data microarray gen karsinoma diawali dengan menormalisasi data menggunakan normalisasi min-max. Pada algoritma spectral clustering-PAM, pertama-tama similaritas antar gen karsinoma dihitung. Selanjutnya, membentuk matriks Laplacian ternormalisasi dari matriks diagonal dan matriks Laplacian tak ternormalisasi. Langkah berikutnya yaitu menghitung eigenvalue dari matriks Laplacian ternormalisasi dan menentukan eigenvector dari eigenvalue terkecil matriks Laplacian ternormalisasi yang disusun menjadi dataset baru untuk dipartisi setiap barisnya menggunakan metode PAM. Berdasarkan running time, waktu yang dibutuhkan untuk menghitung nilai similaritas secara paralel di CUDA 378 kali lebih cepat daripada secara serial di R. Hasil penelitian menunjukkan bahwa spectral clustering-PAM mengelompokkan data microarray gen karsinoma menjadi dua cluster dengan nilai rata-rata silhouette yaitu 0,6458276.

This research aims to implement the spectral clustering PAM using serial similarity algorithm and implement parallel similarity algorithm based on CUDA in spectral clustering method on microarray data of carcinoma genes. Implementation assisted with software based on open source R used in spectral clustering algorithm PAM with serial similarity algorithm and CUDA used to parallel similarity algorithm. Clustering microarray data of carcinoma genes preceded by normalizing the data using min max normalization. In the spectral clustering PAM algorithm, first of all, similarity between genes of carcinoma calculated. Furthermore, forming the normalized Laplacian matrix from diagonal matrix and unnormalized Laplacian matrix. The next step is to calculate the eigenvalues of normalized Laplacian matrix and determine the eigenvectors of k smallest eigenvalues of normalized Laplacian matrix is organized into a new dataset to be partitioned each line using PAM. Based on the running time, the time required to calculate the value of parallel similarity in CUDA is 378 times faster than a serial in R. The results showed that spectral clustering PAM classify microarray data of carcinoma genes into two clusters with an average silhouette value is 0,6458276."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2017

T47172

UI - Tesis Membership Universitas Indonesia Library

Robertus Hudi

Improving performance of PHD function implementation on target association with Multistatic Radar System (MRS) using CUDA parallel computing = Peningkatan performa implementasi fungsi PHD untuk asosiasi target pada sistem radar multistatis (MRS) menggunakan CUDA parallel computing

"Improvement in this experiment are done for 3 following factors: running time, memory efficiency, and speedup. The speedup result achieved is as close as 100× increase. NaÃ¯ve parallelization is used on mapping each matrices data to CUDA memories, for each major operation is done in parallel behavior via self-made CUDA kernels to suits the data dimensions. This make up the improvement of 2nd factor, which is memory efficiency. Results for kernels are captured with NVIDIA profiling tools for the increasing number of random targets on 4 transmitter-receiver (PV) combinations (without any knowledge about the approximation of targets direction). All results are taken according to the average running time of kernel calls and speed up for each size of the input, compared with serial and CPU parallel version data of the previous work. Among advanced techniques for the passive radar system’s target association, several experiments have been done based on Probability Hypothetic Density (PHD) function. The complex calculation makes the computation processes a very demanding task to be done, thus, this paper is focused on PHD function performance comparison between preceding attempts to the implementation using a pure C programming language with CUDA library. A further improvement is highly possible within algorithm optimization itself or applying more advanced parallelization technique.

Peningkatan yang dilakukan pada eksperimen ini meliputi 3 faktor: running time, memory efficiency, dan speedup. Hasil pengujian speedup yang diperoleh mencapai setidaknya 100x peningkatan daripada algoritma semula. Paralelisasi naif yang digunakan untuk memetakan setiap matriks data ke dalam memori CUDA, untuk setiap operasi major dilakukan secara paralel dengan CUDA kernel yang didesain mandiri sehingga dapat menyesuaikan secara otomatis dengan dimensi data yang digunakan. Hal ini memungkinkan peningkatan pada faktor yang kedua yaitu memory efficiency. Hasil dari masing-masing kernel diukur menggunakan data yang diambil dari NVIDIA profiling tools untuk data acak yang meningkat dari segi ukuran, dan diimplementasikan pada 4 kombinasi transmitter-reveiver (PV) tanpa mengetahui aproksimasi arah target. Seluruh hasil pengujian kernel diambil berdasarkan rata-rata running time dari pemanggilan kernel dan speed up dari setiap ukuran masukan, dibandingkan dengan implementasi asosiasi target secara serial dan versi paralel pada CPU dari penelitian terdahulu. Diantara teknik tingkat lanjut yang digunakan untuk menentukan asosiasi target pada sistem radar pasif, beberapa percobaan telah dilakukan berdasarkan fungsi Probability Hypothetic Density (PHD). Kalkulasi yang kompleks menghasilkan proses komputasi yang terlalu berat untuk dilakukan, maka dari itu, percobaan ini fokus kepada komparasi performa fungsi PHD antara penelitian-penelitian terdahulu dengan impleentasi fungsi tersebut pada pustaka CUDA menggunakan bahasa pemrograman C. Peningkatan lebih lanjut sangat dimungkinkan melalui optimisasi algoritma PHD sendiri atau menggunakan teknik paralelisasi yang lebih baik.
"

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2020

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Nayyara Airlangga Raharjo

SYCL vs CUDA vs HIP: Evaluasi Performa dan Portabilitas Berbagai Model Pemrograman GPU pada GPU NVIDIA dan AMD = SYCL vs CUDA vs HIP: Evaluating the Performance and Portability of Different GPU Programming Models on NVIDIA and AMD GPUs

"Dalam HPC, pemanfaatan GPU untuk kapabilitas pemrosesan paralelnya dapat mempercepat komputasi secara masif, terutama untuk masalah yang embarrassingly parallel. Namun, saat memilih model pemrograman GPU, portabilitas model pemrograman dan sistem vendor GPU harus dipertimbangkan. Untuk memahaminya dengan lebih baik, makalah ini menganalisis waktu eksekusi baseline CUDA, HIP, dan SYCL pada GPU NVIDIA dan AMD. Program UVaFTLE, sebuah program yang digunakan untuk menentukan Lagrangian Coherent Structures melalui ekstraksi Finite-Time Lagrangian Exponents (FTLE), digunakan untuk mengukur waktu eksekusi. Eksperimen ini menunjukkan kinerja CUDA dan SYCL pada kedua platform GPU, yang secara konsisten mengalahkan HIP dalam waktu eksekusi. Upaya untuk mengoptimalkan waktu eksekusi fungsi kernel GPU di seluruh platform juga dilakukan, secara drastis memangkas waktu eksekusi kernel preproc hingga lebih dari 90%. Setelah optimasi, SYCL tetap menjadi yang terbaik, sementara CUDA berada di posisi kedua, dan HIP yang terlihat jelas paling lambat. Makalah ini juga membahas tantangan development yang dihadapi. CUDA dan SYCL membanggakan dokumentasi dan dukungan komunitas yang sangat baik, sementara dokumentasi HIP tertinggal dan tidak memberikan pengalaman development yang positif seperti kedua model lainnya.

In HPC, leveraging GPUs for their parallel processing capabilities can massively accelerate computation, especially for embarrassingly parallel problems. However, when choosing a GPU programming model, one must take into consideration the portability of the programming model and the GPU vendor of their system. To understand them better, this paper analyzes the baseline execution time of CUDA, HIP, and SYCL on both NVIDIA and AMD GPUs. The UVaFTLE program, a program used to determine Lagrangian Coherent Structures through the extraction of Finite-Time Lagrangian Exponents (FTLE), is used to benchmark execution time. The experiment showcases the performance of CUDA and SYCL on both GPU platforms, which consistently beat HIP in execution time. An effort to optimize the execution time of the GPU kernel functions across is made, drastically cutting the execution time of the preproc kernel by over 90%. After the optimizations, SYCL remains as the champion, while CUDA comes second, and HIP is clearly the slowest. This paper also discusses the development challenges encountered. CUDA and SYCL boast excellent documentation and community support, while HIP’s documentation falls behind and does not provide a developer experience as positive as the other two."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2025

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Givarrel Veivel Pattiwael

In HPC, leveraging GPUs for their parallel processing capabilities can massively accelerate computation, especially for embarrassingly parallel problems. However, when choosing a GPU programming model, one must take into consideration the portability of the programming model and the GPU vendor of their system. To understand them better, this paper analyzes the baseline execution time of CUDA, HIP, and SYCL on both NVIDIA and AMD GPUs. The UVaFTLE program, an program used to determine Lagrangian Coherent Structures through extraction of Finite-Time Lagrangian Exponents (FTLE), is used to benchmark execution time. The experiment showcases the performance of CUDA and SYCL on both GPU platforms, which consistently beat HIP in execution time. An effort to optimize the execution time of the GPU kernel functions across is made, drastically cutting the execution time of the preproc kernel by over 90%. After the optimizations, SYCL remains as the champion, while CUDA comes second and HIP is clearly the slowest. This paper also discusses the development challenges encountered. CUDA and SYCL boast excellent documentation and community support, while HIP’s documentation falls behind and does not provide adeveloper experience as positive as the other two."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2025

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Valerian Salim

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2025

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Alan Novaldi

Paralelisasi Komputasi Penghitungan Jumlah Kendaraan Menggunakan GPU untuk Mendapatkan Data Penghitungan Secara Real Time = Parallelization of Vehicle Count Computation Using GPUs to Obtain Real Time Vehicle Data Count

"Sistem lampu lalu lintas cerdas merupakan sistem yang dapat melakukan pengaturan lampu lalu lintas secara adaptif berdasarkan kondisi kepadatan lalu lintas. Salah satu cara untuk mendapatkan kondisi kepadatan lalu lintas adalah melakukan komputasi penghitungan jumlah kendaraan dari video CCTV yang terpasang pada persimpanan. Pada penelitian ini dilakukan paralelisasi program penghitungan jumlah kendaraan menggunakan modul Multiprocessing pada python untuk mendapatkan data penghitungan kendaraan dari setiap jalan di persimpangan. Selanjutnya utilisasi GPU dilakukan untuk mendapatkan data secara real time dari suatu komputasi berat video processing. Pada penelitian ini, utilisasi GPU dilakukan dengan menggunakan CUDA sebagai platform yang dapat menghubungkan program dengan GPU pada low-level. Pengelolaan utilisasi GPU pada high-level dilakukan menggunakan TensorFlow yang sudah terintegrasi dengan CUDA. Uji coba eksekusi program dilakukan untuk mendapatkan runtime terbaik dari eksekusi program. Komputasi secara paralel menghasilkan runtime eksekusi komputasi 1.6 kali lebih cepat jika dibandingkan dengan komputasi secara sekuensial. Pada tingkat utilisasi GPU yang lebih optimal, runtime eksekusi komputasi dapat ditingkatkan hingga 2 kali lebih cepat dari komputasi normal. Utilisasi GPU juga terbukti meningkatkan runtime eksekusi program karena komputasi utama video processing tidak lagi dijalankan menggunakan CPU. Hasil uji eksekusi komputasi digunakan untuk membuat visualisasi data penghitung jumlah kendaraan. Visualisasi ini dilakukan agar data yang penghitungan dapat diproses lebih lanjut untuk sistem pengatur lampu lalu lintas. Pada akhir penelitian dilakukan profiling performa GPU menggunakan Nvprof dan NVIDIA Visual Profiler sebagai tools yang disediakan oleh CUDA. Hasil profiling menunjukkan analisis yang menyatakan bahwa tingkat penggunaan GPU untuk komputasi masih belum secara maksimal dilakukan. Hal ini terbukti dari rendahnya angka compute utilization, average throughput dan kernel concurency dari eksekusi program. Sehingga diperlukan adanya optimisasi program penghitungan kendaraan agar utilisasi GPU lebih optimal.

Traffic light intelligence system is an adaptive system which able to control traffic flow on road intersection based on traffic condition. Traffic density information can be obtained from vehicle counting computation using deep learning methodology on CCTV record video data of a road intersection. This study performed parallelization of the vehicle counting computation using the Multiprocessing module in python to get the number of vehicles approaching the intersection. GPU Utilization is performed to obtain vehicle counting data in real time from a heavy computation like video-processing. GPU utilization is carried out using CUDA as a platform that can connect programs with GPUs at low-level architecture. GPU utilization management at high-level is done using TensorFlow which has been integrated with CUDA. Some experiments are performed to get the best runtime from program execution. Parallel computation produces runtime execution 0.6 times faster compared to sequential computation. On more GPU compute utilization optimization, parallel computation can produce runtime 2 times more compared to normal computation. GPU utilization has also been proven to increase the program execution runtime because the main computational video processing is no longer run on the CPU. The experiment result on vehicle detection used to create data visualization about vehicle counting on a road intersection. Data visualization is done so that the vehicle data can be further processed for the traffic light control system. At the end of the study GPU performance profiling was done using Nvprof and NVIDIA Visual Profiler as tools provided by CUDA. Profiling results show that analysis states that the level of GPU usage for computing is still not maximally done. This analysis is shown from the low number of compute utilization, average throughput and kernel concurrency of program execution. GPU utilization need to be optimized in order the program can run optimally on GPU."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2020

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Search Result :: Save as CSV :: Back

Search Result