Hasil Pencarian

Ditemukan 12 dokumen yang sesuai dengan query

Kirk, David B., 1960-

Programming massively parallel processors : a hands-on approach

"This best-selling guide to CUDA and GPU parallel programming has been revised with more parallel programming examples, commonly-used libraries, and explanations of the latest tools. With these improvements, the book retains its concise, intuitive, practical approach based on years of road-testing in the authors' own parallel computing courses. "Programming Massively Parallel Processors: A Hands-on Approach" shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Updates in this edition include: new coverage of CUDA 4.0, improved performance, enhanced development tools, increased hardware support, and more; increased coverage of related technology OpenCL and new material on algorithm patterns, GPU clusters, host programming, and data parallelism; and two new case studies explore the latest applications of CUDA and GPUs for scientific research and high-performance computing."

Waltham, MA: Morgan Kaufmann, 2013

004.35 KIR p

Buku Teks SO Universitas Indonesia Library

Heru Suhartanto

Using Dedicated and Non Dedicated HPC Cluster and GPU NVIDIA Tesla C2070 Cloud computing environment to simulate Molecular Dynamics of PfENR Enzyme with AMBER

"ABSTRACT

Molecular Dynamics (MD) is one of processes that requires High Performance Computing

environments to complete its jobs. In the preparation of virtual screening experiments, MD is one of

the important processes particularly for tropical countries in searching for anti-Malaria drugs. The

search for anti-Malaria has previously conducted, for example by WISDOM project utilizing 1,700

CPUS. This computing infrastructure will be one of the limitation for country like Indonesia that also

needs in silico anti malaria compounds searching from the country medical plants. Thus finding

suitable and affordable computing environment is very important. Our previous works showed that our

dedicated Cluster computing power with 16 cores performance better than those using fewer cores,

however the GPU GTX family computing power is much better.

In this study, we investigate further our previous experiment in finding more suitable computing

environment on much better hardware specification of non dedicated Cluster computing and GPU

Tesla. We used two computing environments, the first one is Barrine HPC Cluster of The University of

Queensland which has 384 compute nodes with 3144 computing cores. The second one is Delta Future

Grid GPU Cluster which has 16 computing nodes with 192 computing cores, each nodes equipped

with 2 NVIDIA Tesla C2070 GPU (448 cores). The results show that running the experiment on a

dedicated computing power is much better than that on non dedicated ones, and the GPU performance

is still much better than that of Cluster."

2015

MK-Pdf

Artikel Jurnal Universitas Indonesia Library

Heru Suhartanto

The performance of a molecular dynamics simulation for the plasmodium falciparum enoyl-acyl carrier-protein reductase enzyme using amber and gtx 780 and 970 double graphical processing units

"The invention of graphical processing units (GPUs) has significantly improved the speed of long processes used in molecular dynamics (MD) to search for drug candidates to treat diseases, such as malaria. Previous work using a single GTX GPU showed considerable improvement compared to GPUs run in a cluster environment. In the current work, AMBER and dual GTX 780 and 970 GPUs were used to run an MD simulation on the Plasmodium falciparum enoyl-acyl carrier protein reductase enzyme; the results showed that performance was improved, particularly for molecules with a large number of atoms using single GPU."

Depok: Faculty of Engineering, Universitas Indonesia, 2018

UI-IJTECH 9:1 (2018)

Artikel Jurnal Universitas Indonesia Library

Wiwien Widyastuti

Kinerja deep convolutional network untuk pengenalan aksara pallawa

"This research trained Deep Convolutional Networks(ConvNets) to classify hand-written Pallava alphabet. The Deep ConvNets architecture consists of two convolutional layers, each followed by maxpooling layer, two Fully-Connected layers. It had 442.602 parameters. This model classified 660 images of hand-written Pallava alphabet into 33 diferent classes. To make training faster, this research used GPU implementation with 384 CUDA cores. Two different techniques were implemented, Stochastic Gradient Descent (SGD) and Adaptive Gradient, each trained with 10, 20, 30 and 40 epoch. The best accuracy was 67,5%, achieved by the model with SGD technique trained at 30 epoch."

Yogyakarta: Media Teknika, 2017

620 MT 12:2 (2017)

Artikel Jurnal Universitas Indonesia Library

Alan Novaldi

Paralelisasi Komputasi Penghitungan Jumlah Kendaraan Menggunakan GPU untuk Mendapatkan Data Penghitungan Secara Real Time = Parallelization of Vehicle Count Computation Using GPUs to Obtain Real Time Vehicle Data Count

"Sistem lampu lalu lintas cerdas merupakan sistem yang dapat melakukan pengaturan lampu lalu lintas secara adaptif berdasarkan kondisi kepadatan lalu lintas. Salah satu cara untuk mendapatkan kondisi kepadatan lalu lintas adalah melakukan komputasi penghitungan jumlah kendaraan dari video CCTV yang terpasang pada persimpanan. Pada penelitian ini dilakukan paralelisasi program penghitungan jumlah kendaraan menggunakan modul Multiprocessing pada python untuk mendapatkan data penghitungan kendaraan dari setiap jalan di persimpangan. Selanjutnya utilisasi GPU dilakukan untuk mendapatkan data secara real time dari suatu komputasi berat video processing. Pada penelitian ini, utilisasi GPU dilakukan dengan menggunakan CUDA sebagai platform yang dapat menghubungkan program dengan GPU pada low-level. Pengelolaan utilisasi GPU pada high-level dilakukan menggunakan TensorFlow yang sudah terintegrasi dengan CUDA. Uji coba eksekusi program dilakukan untuk mendapatkan runtime terbaik dari eksekusi program. Komputasi secara paralel menghasilkan runtime eksekusi komputasi 1.6 kali lebih cepat jika dibandingkan dengan komputasi secara sekuensial. Pada tingkat utilisasi GPU yang lebih optimal, runtime eksekusi komputasi dapat ditingkatkan hingga 2 kali lebih cepat dari komputasi normal. Utilisasi GPU juga terbukti meningkatkan runtime eksekusi program karena komputasi utama video processing tidak lagi dijalankan menggunakan CPU. Hasil uji eksekusi komputasi digunakan untuk membuat visualisasi data penghitung jumlah kendaraan. Visualisasi ini dilakukan agar data yang penghitungan dapat diproses lebih lanjut untuk sistem pengatur lampu lalu lintas. Pada akhir penelitian dilakukan profiling performa GPU menggunakan Nvprof dan NVIDIA Visual Profiler sebagai tools yang disediakan oleh CUDA. Hasil profiling menunjukkan analisis yang menyatakan bahwa tingkat penggunaan GPU untuk komputasi masih belum secara maksimal dilakukan. Hal ini terbukti dari rendahnya angka compute utilization, average throughput dan kernel concurency dari eksekusi program. Sehingga diperlukan adanya optimisasi program penghitungan kendaraan agar utilisasi GPU lebih optimal.

Traffic light intelligence system is an adaptive system which able to control traffic flow on road intersection based on traffic condition. Traffic density information can be obtained from vehicle counting computation using deep learning methodology on CCTV record video data of a road intersection. This study performed parallelization of the vehicle counting computation using the Multiprocessing module in python to get the number of vehicles approaching the intersection. GPU Utilization is performed to obtain vehicle counting data in real time from a heavy computation like video-processing. GPU utilization is carried out using CUDA as a platform that can connect programs with GPUs at low-level architecture. GPU utilization management at high-level is done using TensorFlow which has been integrated with CUDA. Some experiments are performed to get the best runtime from program execution. Parallel computation produces runtime execution 0.6 times faster compared to sequential computation. On more GPU compute utilization optimization, parallel computation can produce runtime 2 times more compared to normal computation. GPU utilization has also been proven to increase the program execution runtime because the main computational video processing is no longer run on the CPU. The experiment result on vehicle detection used to create data visualization about vehicle counting on a road intersection. Data visualization is done so that the vehicle data can be further processed for the traffic light control system. At the end of the study GPU performance profiling was done using Nvprof and NVIDIA Visual Profiler as tools provided by CUDA. Profiling results show that analysis states that the level of GPU usage for computing is still not maximally done. This analysis is shown from the low number of compute utilization, average throughput and kernel concurrency of program execution. GPU utilization need to be optimized in order the program can run optimally on GPU."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2020

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Mohammad Rizky Chairul Azizi

Implementasi Fitur Semantic Object Segmentation pada Aplikasi Lumba.ai = Implementation of the Semantic Object Segmentation Feature in the Lumba.ai Application

"Di era perkembangan teknologi ini, sains data menjadi kebutuhan dalam pekerjaan manusia, sehingga peneliti mengembangkan Lumba.ai untuk memudahkan masyarakat umum mengakses teknologi data science dan computer vision, khususnya fitur semantic object segmentation, tanpa memerlukan pemahaman mendalam tentang IT. Penelitian ini berfokus pada pengembangan fitur semantic object segmentation pada Lumba.ai dengan memanfaatkan model Convolutional Neural Network seperti Fully Convolutional Networks (FCN) dan DeepLabv3. Proses implementasinya meliputi pemrosesan data, pemodelan, dan evaluasi model menggunakan metrik, serta komparasi model dengan menggunakan weighted binary cross entropy. Hasil menunjukkan komparasi metrik pada model-model machine learning yang diuji menunjukkan FCN dan DeepLabv3 merupakan dua model dengan performa terbaik dengan mendapatkan skor IoU dan Recall tertinggi yang didukung ResNet101 sebagai backbone serta diterapkan W-BCE. Dalam pengembangannya, penulis mengimplementasi task queueing dan monitoring GPU guna memproses request pengguna dengan optimal saat melakukan training. Dari penelitian ini, didapat hasil yang cukup baik dengan melakukan konfigurasi satu celery worker dan jumlah concurrency yang dinamis bergantung kepada jumlah GPU yang available dari proses monitoring GPU.

In this era of technological development, data science has become essential in human work, prompting researchers to develop Lumba.ai to facilitate public access to data science and computer vision technology, particularly the feature of semantic object segmentation, without requiring deep IT knowledge. This research focuses on developing the semantic object segmentation feature on Lumba.ai by utilizing Convolutional Neural Network models such as Fully Convolutional Networks (FCN) and DeepLabv3. The implementation process includes data processing, modeling, and model evaluation using metrics, as well as model comparison using weighted binary cross entropy. The results show that the comparison of metrics on the tested machine learning models indicates that FCN and DeepLabv3 are the two best-performing models, achieving the highest IoU and Recall scores, supported by ResNet101 as the backbone and applying W-BCE. During development, the author implemented task queuing and GPU monitoring to optimally process user requests during training. The research produced satisfactory results by configuring a single celery worker and dynamic concurrency depending on the number of GPUs available from the GPU monitoring process."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Lababan, Tara Mazaya

Analisis Performa GPU pada OpenStack Nova, Zun, dan Ironic Menggunakan Glmark2, Phoronix PyTorch, dan Phoronix NAMD pada Lingkungan Komputasi Awan Fasilkom UI = GPU Performance Analysis on OpenStack Nova, Zun, and Ironic Using Glmark2, Phoronix PyTorch, and Phoronix NAMD on the Fasilkom UI Cloud Computing Environment

"Penelitian ini menganalisis dampak dari penggunaan Virtual Machine (VM), container, dan bare-metal terhadap performa Graphics Processing Unit (GPU) dengan memanfaatkan VM pada OpenStack Nova, container pada OpenStack Zun, dan bare-metal pada OpenStack Ironic. Metode virtualisasi GPU yang digunakan pada penelitian ini adalah GPU passthrough. Pengukuran performa GPU dilakukan dengan menggunakan aplikasi Glmark2 untuk menguji performa graphic rendering, Phoronix NAMD untuk menguji performa simulasi molekuler, dan Phoronix PyTorch untuk menguji performa training model. Hasil analisis menunjukkan bahwa penggunaan VM pada OpenStack Nova mengakibatkan penurunan performa GPU sebesar 15.5% pada Glmark2, 44.0% pada Phoronix NAMD, dan 8.4% pada Phoronix PyTorch. Penggunaan container pada Open Stack Zun mengakibatkan penurunan performa GPU sebesar 5.8% pada Glmark2 dan 19.7% pada Phoronix NAMD, tetapi tak ada perbedaan signifikan pada Phoronix PyTorch jika dibandingkan dengan physical machine (Î± = 0.05). Penggunaan bare-metal pada OpenStack Ironic mengakibatkan penurunan performa sebesar 1.5% pada Phoronix NAMD dan peningkatan tak signifikan sebesar -6.2% pada Phoronix PyTorch. Pengujian Glmark2 pada OpenStack Ironic dengan perlakuan yang sama seperti benchmark lainnya menunjukkan adanya penurunan performa sebesar 8.7%. Namun, perlakuan khusus pada Glmark2 OpenStack Ironic menunjukkan peningkatan performa sebesar -1.0% pada resolusi 1920x1080 jika dibandingkan dengan physical machine. Perlakuan khusus ini berupa menjalankan dummy Glmark2 dengan resolusi yang sangat rendah dan Glmark2 utama secara bersamaan. Berdasarkan hasil penelitian, dapat disimpulkan bahwa urutan computing resource dengan penurunan performa GPU yang paling minimal adalah penggunaan bare-metal OpenStack Ironic, diikuti dengan penggunaan container OpenStack Zun, dan diikuti dengan penggunaan VM OpenStack Nova.

This research analyzes the effects of Virtual Machine (VM), containers, and bare-metal usage on Graphics Processing Unit (GPU) performance, using VMs provided by OpenStack Nova, containers provided by OpenStack Zun, and bare-metal provided by OpenStack Ironic. The GPU virtualization method employed in this paper is GPU passthrough. GPU performance is measured using multiple benchmark applications, those being Glmark2 to measure graphic rendering performance, Phoronix NAMD to measure molecular simulation performance, and Phoronix PyTorch to measure training model performance. The results of our analysis shows that the usage of OpenStack Nova’s VMs causes GPU performance slowdown of up to 15.5% on Glmark2, 44.0% on Phoronix NAMD and 8.4% on Phoronix PyTorch. Using OpenStack Zun’s containers also causes GPU performance slowdowns of up to 5.8% on Glmark2 and 19.7% on Phoronix NAMD, with no significant changes on GPU performance with Phoronix PyTorch compared to the physical machine (Î± = 0.05). In contrast, using OpenStack Ironic’s bare-metal causes GPU performance slowdown of 1.5% on Phoronix NAMD and an insignificant increase in performance on Phoronix PyTorch by 6.2%. Meanwhile the results of the Glmark2 benchmark on OpenStack Ironic following the normal procedures shows GPU performance slowdown of up to 8.7%. However, the same Glmark2 OpenStack Ironic benchmark with a special procedure shows an increase in GPU performance of up to 1.0% on the 1920x1080 resolution compared to the physical machine. This special procedure involves running a dummy Glmark2 process with a tiny resolution in parallel with the main Glmark2 process. Based on the results, we can conclude that the hierarchy of computing resources in terms of minimal GPU performance slowdown starts with the usage of OpenStack Ironic’s bare-metal, followed by the usage of OpenStack Zun’s containers, and lastly the usage of OpenStack Nova’s VMs."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library