Hasil Pencarian

Ditemukan 2 dokumen yang sesuai dengan query

Agung Santosa

Spectrogram sebagai fitur untuk convolutional neural network dalam pengembangan pengenal wicara bahasa Indonesia berbasis hidden Markov model = Spectrogram as a feature for convolutional neural network in the development of hidden Markov model based bahasa Indonesia speech recognition

"Pesatnya perkembangan Deep Learning akhir-akhir ini juga menyentuh ASR berbasis HMM, sehingga memunculkan teknik hibrid HMM-ANN. Salah satu teknik Deep Learning yang cukup menjanjikan adalah penggunaan arsitektur CNN. CNN yang memiliki kemampuan mendeteksi local correlation sesuai untuk digunakan pada data spectrum suara. Spectrogram memiliki karakteristik local correlation yang nampak secara visual. Penelitian ini adalah eksperimen penggunaan spectrogram sebagai fitur untuk HMM-CNN untuk meningkatkan kinerja ASR berbasis HMM. Penelitian menyimpulkan spectogram dapat digunakan sebagai fitur untuk HMM-CNN untuk meningkatkan kinerja ASR berbasis HMM.

The latest surge in Deep Learning affecting HMM based ASR, which give birth to hybrid HMM-ANN technique. One of the promising Deep Learning technique is the implementation of CNN architecture. The ability of CNN to detect local correlation make it suitable to be used for speech spectral data. Spectrogram as a speech spectral data has local correlation characteristic which is visually observable. This research is an experiment to use spectrogram as a feature for HMM-CNN to add to the performance of HMM based ASR. This research found that spectrogram is indeed can be used as a feature for CNN to add to the performance of HMM based ASR."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2015

T43862

UI - Tesis Membership Universitas Indonesia Library

Arvalinno

Perancangan Model Pengenalan Emosi pada Percakapan Berbahasa Indonesia dengan Metode Transfer Learning VGG-16 = Design of Speech Emotion Recognition Model for Indonesian Language with VGG-16 Transfer Learning Method

Kecerdasan buatan atau Artificial Intelligence (AI) banyak berkembang dalam sektor-sektor seperti: speech recognition, computer vision, Natural Language Processing, dll. Salah satu sektor penting yang banyak dikembangkan oleh peneliti adalah Speech Emotion Recognition atau pengenalan emosi berdasarkan suara manusia. Penelitian ini semakin berkembang karena timbul sebuah tantangan bagi manusia untuk memiliki interaksi mesin dan manusia yang lebih natural yaitu suatu mesin yang dapat merespon emosi manusia dengan memberikan balasan yang tepat juga. Perancangan Speech Emotion Recognition pada penelitian ini menggunakan dataset berupa fitur ekstraksi audio MFCC, Spectrogram, Mel Spectrogram, Chromagram, dan Tonnetz serta memanfaatkan metode Transfer Learning VGG-16 dalam pelatihan modelnya. Dataset yang digunakan diperoleh dari pemotongan audio dari beberapa film berbahasa Indonesia dan kemudian audio yang diperoleh diekstraksi fitur dalam kelima bentuk fitur yang disebut sebelumnya. Hasil akurasi model paling baik dalam penelitian ini adalah model transfer learning VGG-16 dengan dataset Mel Spectrogram yaitu dengan nilai akurasi 56.2%. Dalam pengujian model dalam pengenalan setiap emosi, f1-score terbaik diperoleh model transfer learning VGG-16 dengan dataset Mel Spectrogram dengan f1-score yaitu 55.5%. Skala mel yang diterapkan pada ekstraksi fitur mel spectrogram berpengaruh terhadap baiknya kemampuan model dalam mengenali emosi manusia.

Artificial Intelligence has been used in many sectors, such as speech recognition, computer vision, Natural Language Processing, etc. There was one more important sector that has been developed well by the scientists which are Speech Emotion Recognition. This research is developing because of the new challenge by human to have a better natural interaction between machines and humans where machines can respond to human’s emotions and give proper feedback. In this research, to create the speech emotion recognition system, audio feature extraction such as MFCC, Spectrogram, Mel Spectrogram, Chromagram, and Tonnetz were used as input, and using VGG-16 Transfer Learning Method for the model training. The datasets were collected from the trimming of audio from several Indonesian movies, the trimmed audio will be extracted to the 5 features mentioned before. The best model accuracy is VGG-16 with Mel Spectrogram dataset which has reached 56.2% of accuracy. In terms of recognizing the emotion, the best f1-score is reached by the model VGG-16 with Mel Spectrogram dataset which has 55.5% of f1-score. Mel scale that is applied to the feature extraction of mel spectrogram affected the model’s ability to recognize human emotion.

Depok: Fakultas Teknik Universitas Indonesia, 2022

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian