Hasil Pencarian

Ditemukan 187346 dokumen yang sesuai dengan query

Imam Syafei

Analisis kinerja kombinasi metode berbasis lexicon dan metode berbasis learning pada analisis sentimen twitter = Performance analysis on combination lexicon based method and learning based method in twitter sentiment analysis

"Analisis sentimen merupakan kegiatan untuk mencari pendapat atau sentimen seorang penulis tentang suatu entitas atau objek tertentu yang dapat berupa pendapat positif atau pendapat negatif. Analisis sentimen pada data yang sangat besar tidak dapat dilakukan secara manual sehingga membutuhkan bantuan metode non-manual. Terdapat dua metode non-manual dasar, yaitu metode berbasis lexicon dan metode berbasis learning. Pada skripsi ini, akan dibahas kombinasi metode berbasis lexicon dan metode berbasis learning. Metode kombinasi ini akan digunakan untuk melakukan analisis sentimen dengan data teks yang berasal dari media sosial Twitter atau biasa disebut tweets. Data yang dikumpulkan berupa tweets yang membicarakan seputar tokoh kandidat calon presiden Republik Indonesia periode 2014-2019.

Sentiment Analysis is an activity to find author's opinion or sentiment about an entity or object, which can be positive or negative opinion. Sentiment Analysis in big size of data can't be done manually so that it needs hand from non-manual method. There are two basic non-manual methods, that is lexicon-based method and learning-based method. In this research, will be explained combination lexicon-based method and learning-based method. This method of combination will be used to do sentiment analysis on text data, which from social media Twitter or so called tweets. The collection of data is tweets that contain chatting about the presidential candidate of Republic of Indonesia period 2014-2019."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2014

S55146

UI - Skripsi Membership Universitas Indonesia Library

Syahrul Amrie

Analisis sentimen terhadap layanan imigrasi menggunakan data Twitter, Instagram dan ulasan pada aplikasi M-Paspor di Google play store berbasis pembelajaran mesin = Sentiment analysis on immigration services using data Twitter, Instagram and review application M-paspor on Google play store based on machine learning

"Perkembangan media sosial telah berkembang pesat, tidak hanya sebagai alat komunikasi sosial antar individu. Fungsi dan kegunaannya semakin berkembang serta banyak dimanfaatkan organisasi swasta maupun pemerintah untuk mengukur tingkat layanan. Ditjen Imigrasi selaku organisasi pemerintah merupakan salah satu organisasi yang memanfaatkan media sosial, salah satu fungsinya untuk mengetahui apakah layanan yang diberikan telah diterima dengan baik oleh masyarakat. Selain melalui media sosial, Imigrasi juga telah meluncurkan aplikasi M-Paspor di platform Google Play Store, di platform tersebut Imigrasi juga dapat mengetahui tingkat efektivitas dari aplikasi yang telah diluncurkan. Berdasarkan survei yang dilakukan oleh Balitbangham yang merupakan internal dari Kemenkumham, layanan yang diberikan oleh imigrasi mendapat nilai sangat baik, namun faktanya pada media sosial maupun google play store banyak komentar maupun ulasan yang kurang puas dengan pelayanan pihak imigrasi. Hal tersebut menjadi kontradiksi antara hasil survei Balitbangham dan data di media sosial. Namun, akan sulit untuk melakukan analisis data media sosial dikarenakan jumlah yang banyak. Oleh karena itu, perlu dilakukan untuk mengusulkan sistem untuk melakukan analisis sentimen menggunakan data teks komentar dan ulasan. Sehingga pihak Imigrasi dapat mengambil langkah terbaik untuk dapat memperbaiki layanan yang masih belum maksimal. Dataset yang digunakan berupa data yang diambil dari media sosial Twitter dan Instagram serta ulasan pada Google Play Store. Hasil penelitian menunjukan jika fitur ekstraksi TF-IDF Unigram yang dipadukan dengan algoritma Support Vector Machine (SVM) serta SMOTE menghasilkan performa paling tinggi dibandingkan dengan nave Bayes (NB) maupun Random Forest (RF). dalam melakukan klasifikasi, SVM menghasilkan dengan hasil Precision 72%, Recall 69%, Accurasy 69, serta F1-Score sebesar 68%. Model tersebut dapat digunakan Imigrasi untuk mengetahui umpan balik pelayanan dari masyarakat yang dapat digunakan sebagai pertimbangan dalam melakukan perbaikan pelayanan serta merumuskan strategi pelayanan oleh Direktorat terkait agar pelayanan lebih efisien untuk kedepannya. Sehingga, Imigrasi akan mampu dengan cepat merespon kendala yang dihadapai oleh masyarakat.

The development of social media has grown rapidly, not only as a means of social communication between individuals. Its functions and uses are growing and are widely used by private and government organizations to measure service levels. The Directorate General of Immigration as a government organization is one of the organizations that utilizes social media. Its function is to find out whether the services provided have been well received or not by the public. Apart from social media, Immigration has also launched the M-Passport application on the Google Play Store platform, on the platform, Immigration officials can also find out the effectiveness of the applications that have been launched. Based on a survey conducted by Balitbangham which is internal to the Ministry of Human Rights, the services provided by immigration get a very good score, but the fact is that on social media and the Google Play Store some many comments and reviews are not satisfied with the services of the immigration authorities. This is a contradiction between the results of the Balitbangham survey and data on social media. However, it will be difficult to analyze social media data due to the large number. Therefore, it is necessary to propose a system to perform sentiment analysis using commentary and reviewing text data. So that Immigration can take the best steps to be able to improve services that are still not optimal. The dataset used is in the form of data taken from social media Twitter and Instagram as well as reviews on the Google Play Store. The results show that the TF-IDF Unigram extract feature combined with the Support Vector Machine (SVM) and SMOTE algorithms produces the highest performance compared to NaÃ¯ve Bayes (NB) and Random Forest (RF). In classifying, SVM produces 72% Precision, 69% Recall, 69% Accuracy, and 68% F1-Score. This model can be used by Immigration to find out service feedback from the community as a consideration in making service improvements and formulating more efficient service strategies for the future. Thus, Immigration will be able to quickly respond to the obstacles faced by the community."

Jakarta: Fakultas Ilmu Kompter Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Yusak Sutikno

Analisis Sentimen dan Pemodelan Topik pada Jasa Pengiriman Domestik di Era Covid-19 Berbasis Time Window Lexicon-TFIDF-SVM dan LDA-Mallet = Sentiment Analysis and Topic Modeling in Domestic Delivery Services in The Covid-19 Era Based on The Lexicon-TFIDF-SVM and LDA-Mallet Time Window

"Pandemi covid-19 dan kebijakan-kebijakan penanggulangannya telah mengubah cara hidup dan kebiasaan banyak orang di seluruh dunia. Terbatasnya pergerakan dan aktivitas masyarakat mendorong mereka untuk mengandalkan sektor pengiriman barang dalam upaya pemenuhan kebutuhan. Hal ini menjadikan sektor usaha pengiriman barang menjadi bagian penting dalam pemenuhan kebutuhan masyarakat di tengah pandemi. Tersedianya akun layanan resmi tiap penyedia barang di media sosial Twitter sebagai wadah pengaduan dan aspirasi pelanggan, memungkinkan untuk dilakukan analisis tren kebutuhan hingga mengukur kepuasan pelanggan terhadap layanan sektor jasa ini sebelum dan selama pandemi. Penelitian mengenai analisis sentimen pelanggan terhadap suatu produk maupun jasa sudah banyak dilakukan, namun implementasi pendekatan analisis Time Window Lexicon–TFIDF-SVM dan pemodelan topik LDA-Mallet terintegrasi belum banyak dilakukan, terutama dalam konteks analisis sentimen pada sektor jasa pengiriman barang. Penelitian ini menggunakan data Twitter yang diperoleh dengan metode scrapping dengan rentang waktu Oktober 2019 - September 2020 pada lima penyedia layanan pengiriman barang paling populer di Indonesia. Pendekatan leksikon dipergunakan dalam pembentukan data latih, dimana dari data latih ini diperoleh model klasifikasi memperoleh tingkat akurasi 89,21% kemudian diinferensikan dengan pendekatan statistik TFIDF-SVM untuk memprediksi polaritas sentimen keseluruhan data. Penelitian ini memberikan hasil bahwa: (1) Pandemi covid-19 melalui parameter kebijakan penanganan pandemi secara signifikan meningkatkan aktivitas penyampaian keluhan/aspirasi dimana hal ini menunjukkan terjadinya peningkatan jumlah layanan yang diberikan; (2) sistem pelayanan pengiriman belum cukup kuat untuk menghadapi fluktuasi permintaan, dimana peningkatan jumlah pelayanan dibarengi juga dengan peningkatan ketidakpuasan yang terindikasi dari meningkatnya polaritas sentimen ‘Negatif’ selama pandemi. Pada periode tiga bulan kedua terlihat bahwa adaptasi dan perbaikan layanan hanya terjadi pada sebagian penyedia layanan saja; dan (3) terdapat beberapa perubahan topik keluhan/aspirasi yang dilihat pada rentang waktu sebelum pandemi, tiga bulan pertama pandemi, dan tiga bulan kedua pandemi.

The Covid-19 pandemic and activity restriction policies in an effort to contain its spread have changed the ways of life and habits of many people around the world. Limited movement and community activities encourage them to rely on the shipping sector to meet their needs. This makes the delivery of goods an important part of meeting people's needs in the midst of a pandemic. The availability of official service accounts of each goods provider on Twitter social media as a forum for complaints and customer aspirations, enabling analysis of service needs trends and measuring customer satisfaction with these service sector services before and during the pandemic. Research on customer sentiment analysis towards a product or service has been done a lot, but the implementation of the lexicon–tfidf-svm time window approach integrating with LDA-Mallet topic modeling has not been done much, especially in the context of sentiment analysis in the freight forwarding sector. This research uses Twitter data obtained by the scrapping method from October 2019 - September 2020 on the five most popular delivery service providers in Indonesia. The lexicon approach is used in the formation of training data, where the classification model of this training data accurate rate of 89.21% is obtained which is then referred to predict the polarity of the overall sentiment of the data by the TFIDF-SVM statistical approach. This study provides the results that: (1) the Covid-19 pandemic through the parameters of the pandemic management policy significantly increased the activity of submitting complaints/aspirations, indicating an increase in the number of requests for services or services provided; (2) the delivery service system is not yet strong enough to deal with fluctuations in increased demand, where an increase in the number of services is accompanied by an increase in dissatisfaction, although it is not significant for all service providers. In the second three-months period, it appears that the process of adaptation and improvement of services only occurred in part of service providers; and (3) there were some changes in the topic of complaints/aspirations that were seen in the timeframe before the pandemic, the first three months of the pandemic, and the second three months of the pandemic."

Depok: Fakultas Teknik Universitas Indonesia, 2020

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Evan Benedict Zaluchu

Analisis sentimen twitter pada cloudera berbasis AFINN word list menggunakan apache hadoop, flume, dan hive = AFINN word list based twitter sentiment analysis in cloudera using apache hadoop, flume, and hive

"ABSTRAK

Big Data adalah salah satu fenomena yang sudah tidak jarang terjadi di berbagai aspek-aspek kehidupan, baik di bidang industri, keuangan, sosial, dan sebagainya. Dari segi sosial, penggunaan media sosial seperti Twitter merupakan salah satu aplikasi nyata dari teknologi Big Data. Melalui opini-opini yang disampaikan pada Twitter, kita dapat mengetahui hal-hal apa saja yang menjadi topik terkini. Dengan besarnya jumlah tweet yang dipublikasikan tiap hari, atau tiap jam, membuat analisis terhadap Twitter ini hampir mustahil dilakukan tanpa menggunakan teknologi komputasi. Environment seperti Hadoop, Flume, dan Hive merupakan salah satu teknologi dapat digunakan untuk menganalisis jumlah data yang besar, yang mengalir di dalam Twitter.

ABSTRACT

Big Data is one of the global phenomenon that has become broad thing in the various aspects of the daily life, such as in industry sector, finance sector, social sector, etc. From the social aspect, the usage of the social media such as Twitter is one of the real application of the Big Data technology. Through the opinions that expressed on Twitter, we can find out about the things that become the current trending topic. With the numbers of the tweets that published every day, or every hour, making it impossible to do the Twitter analyzing without the use of the computational technology. The environment such as Hadoop, Flume, dan Hive is one of the technologies that can be use to analyze the enormous size of data, that flows around Twitter. "

2017

S67967

UI - Skripsi Membership Universitas Indonesia Library

Mohammad Luthfi Pratama

Studi komparasi metode multiclass support vector machine untuk masalah analisis sentimen pada twitter = Comparative study of multiclass support vector machine method for sentiment analysis problem on twitter

"Perkembangan teknologi informasi khususnya internet di Indonesia terbilang sangat pesat. Media sosial hadir sebagai sarana baru dalam berkomunikasi dengan perantara internet. Salah satu media sosial pemicu hal tersebut adalah twitter. Banyak sekali variasi topik yang dihasilkan para pengguna twitter. Setiap topik yang dihasilkan memiliki nilai sentimen. Nilai sentimen dibagi menjadi positif, negatif, dan netral. Untuk mengetahui nilai sentimen, digunakanlah analisis sentimen. Namun dengan banyaknya pengguna twitter, akan memakan waktu banyak untuk mengetahui nilai sentimen. Sehingga digunakanlah Support Vector Machine (SVM). Tetapi SVM hanya bisa mengklasifikasikan 2 kelas. Sehingga diperlukan pendekatan Multiclass. terdapat dua cara dalam melakukan pendekatan Multiclass, yaitu pendekatan One Vs One dan One Vs All.

The growth of information technology, especially the Internet in Indonesia, is rapidly increasing. Social media is the new way to communicate with other users on the internet. Twitter is one of the social media that contribute the growth. There are many topics that are generated by the users. Each topic that is generated by the users has the sentiment value. The sentiment value is divided into positive, negative, and neutral. To determine the value of the sentiment, we need to use the sentiment analysis. However, with so many twitter users, it will take a lot of time. That is why we use Support Vector Machine (SVM). However, SVM can only classify two classes. Therefore, we need multiclass approach. There are two ways of doing multiclass approach: One Vs One and One vs All."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2014

S58011

UI - Skripsi Membership Universitas Indonesia Library

Nadya Safitri

Perbandingan Machine Learning dan Deep Learning pada Klasifikasi Teks dan Analisis Sentimen terhadap Dampak Covid-19 di Indonesia pada Twitter dengan Pendekatan Multi-Label = Comparative of Machine Learning and Deep Learning on Text Classification and Sentiment Analysis on the Impact of Covid-19 in Indonesia on Twitter Using a Multi-Label Approach

"Pemilihan metode machine learning atau deep learning menjadi suatu permasalahan dalam klasifikasi. Hal ini didapatkan dari penelitian yang menunjukkan bahwa deep learning kinerjanya lebih baik daripada machine learning, namun terdapat penelitian bahwa kedua metode tersebut kinerjanya tidak menentu tergantung dataset yang digunakan. Oleh karena itu, penelitian ini membandingkan kinerja dari machine learning dan deep learning untuk permasalahan klasifikasi teks dan analisis sentimen terhadap dampak Covid-19 di Indonesia. Hasil penelitian ini menunjukkan bahwa kinerja pada klasifikasi teks dan analisis sentimen menggunakan metode machine learning lebih baik dibandingkan dengan deep learning. Hasil penelitian mengenai klasifikasi teks menunjukkan bahwa kinerja metode machine learning yaitu Label Powerset dan Random Forest menghasilkan akurasi 77 % sedangkan kinerja metode deep learning yaitu Long Short-Term Memory (LSTM) dan Gate Reccurent Unit (GRU) menghasilkan akurasi 48%. Hasil penelitian mengenai analisis sentimen menunjukkan bahwa kinerja metode machine learning yaitu Label Powerset dan Random Forest menghasilkan akurasi 63 % sedangkan kinerja metode deep learning yaitu Long Short-Term Memory (LSTM) dan Gate Reccurent Unit (GRU) menghasilkan akurasi 55% dan 54%. Keseimbangan jumlah label pada semua label mempengaruhi hasil dari klasifikasi. Oleh karena itu, disarankan untuk menggunakan metode untuk menyeimbangkan jumlah label yang digunakan untuk klasifikasi.

The choice of machine learning or deep learning methods becomes a problem in classification. This is obtained from research which shows that deep learning performs better than machine learning, but there is research that the two methods perform erratically depending on the dataset used. Therefore, this study compares the performance of machine learning and deep learning for text classification problems and sentiment analysis on the impact of Covid-19 in Indonesia. The results of this study indicate that the performance of text classification and sentiment analysis using machine learning methods is better than deep learning. The results of research on text classification show that the performance of machine learning methods, namely Label Power and Random Forest, produces an accuracy of 77%, while the performance of deep learning methods, namely Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU), produces an accuracy of 48%. The results of the research on sentiment analysis show that the performance of machine learning methods, namely Label Power and Random Forest, produces an accuracy of 63%, while the performance of deep learning methods, namely Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU), produces 55% and 54% accuracy. The balance of the number of labels on all labels affects the results of the classification. Therefore, it is advisable to use a method to balance the number of labels used for classification."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Amanda Nydia Augustizhafira

Akurasi transfer learning menggunakan metode neural network untuk masalah analisis sentimen pada tweets berbahasa Indonesia = The accuracy of transfer learning using neural network method for sentiment analysis problem on Indonesian tweets

"Analisis sentimen merupakan bagian dari data mining text mining , yaitu proses memahami, mengekstrak, dan mengolah data tekstual secara otomatis untuk mendapatkan informasi. Pada penelitian ini, analisis sentimen diterapkan pada salah satu media sosial, yaitu Twitter. Analisis sentimen tergolong sebagai masalah klasifikasi yang dapat diselesaikan menggunakan salah satu metode machine learning, yaitu Neural Network. Pada machine learning, data dibagi menjadi data pelatihan dan data pengujian yang berasal dari domain yang sama.

Permasalahan utama pada penelitian ini adalah data pelatihan dan data pengujian berasal dari dua domain yang berbeda, sehingga perlu diterapkan pembelajaran lain selain machine learning. Masalah tersebut dapat diselesaikan dengan menggunakan transfer learning. Transfer learning merupakan suatu pembelajaran model yang dibangun oleh suatu data pelatihan dari suatu domain dan diuji oleh suatu data pengujian dari domain yang berbeda dari domain data pelatihan. Simulasi dalam penelitian ini menghasilkan suatu akurasi transfer learning dengan metode Neural Network yang nantinya akan diuji dengan fitur n-gram bi-gram dan tri-gram serta satu metode seleksi fitur, yaitu Extra-Trees Classifier.

Dalam penelitian ini, nilai akurasi transfer learning tertinggi didapat saat hidden layer berjumlah satu. Sebagian besar nilai akurasi tertinggi didapat saat penggunaan 250 neuron pada hidden layer. Fungsi aktivasi ReLU dan tanh menghasilkan nilai akurasi yang lebih tinggi dibandingkan fungsi aktivasi logistic sigmoid. Penggunakan metode seleksi fitur dapat meningkatkan kinerja transfer learning sehingga nilai akurasinya lebih tinggi dibandingkan simulasi tanpa penggunaan metode seleksi fitur.

Sentiment analysis is a part of data mining text mining , which is the process of understanding, extracting, and processing textual data automatically to obtain information. In this research, sentiment analysis is applied to one social media called Twitter. Sentiment analysis is categorized as a classification problem that can be solved using one of machine learning methods, namely Neural Network. In machine learning, data is divided into training data and test data from the same domain.
The main problem in this research is training data and test data come from two different domains, so it is necessary to apply other learning beside machine learning. The problem can be solved by using transfer learning. Transfer learning is a model learning constructed by a training data from a domain and tested by a test data from a different domain from the training data domain. The simulation in this research resulted in an accuracy of learning transfer with Neural Network method which will be tested using n grams bi grams and tri grams and one feature selection method called Extra Trees Classifier.
In this research, the highest value of transfer learning accuracy is obtained when one hidden layer is used. Most of the highest accuracy values are obtained from the use of 250 neurons on the hidden layer. The activation function of ReLU and tanh yield a higher accuracy value than the logical activation function sigmoid . The use of feature selection method can improve the transfer learning performance so that the accuracy value is higher than simulation without the use of feature selection method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2018

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Rifqy Mikoriza Turjaman

Analisis Sentimen Berbasis Aspek Marketing Mix Terhadap Ulasan Aplikasi Dompet Digital: Studi Kasus Aplikasi LinkAja pada Twitter = Sentiment Analysis Based on Marketing Mix Aspect of Digital Wallet Application Reviews: Case Study LinkAja Application on Twitter

"Selama pandemi COVID-19 yang telah melanda dunia sejak akhir tahun 2019, transaksi dengan metode pembayaran cashless mengalami peningkatan signifikan. LinkAja sebagai salah satu perusahaan dompet digital di Indonesia yang melayani pembayaran cashless, perlu meningkatkan daya saing di tengah ketatnya persaingan bisnis dompet digital. Salah satunya adalah dengan meningkatkan kepuasan konsumen dengan memperhatikan berbagai aspek berdasarkan teori marketing mix 4P. Aspek yang digunakan berdasarkan teori marketing mix terdiri dari beberapa elemen yang umum yang disebut dengan 4P, yaitu produk (product), harga (price), tempat (place), dan promosi (promotion). Penelitian ini berfokus untuk melakukan sentimen analisis berbasis aspek untuk mengetahui aspek mana yang mendapat penilaian positif, negatif, atau netral dari data ulasan yang diberikan konsumen. Hasil penelitian dapat digunakan sebagai referensi bagi LinkAja dalam menentukan aspek mana yang perlu diprioritaskan dalam upaya meningkatkan daya saing perusahaan. Data yang digunakan dalam penelitian ini merupakan data Twitter yang berkaitan dengan mention akun @linkaja dengan periode 1 Januari 2022 hingga 17 Mei 2022. Penelitian ini melakukan klasifikasi aspek menggunakan string matching menggunakan library Thefuzz. Kemudian klasifikasi sentimen dilakukan menggunakan algoritma SVM. Pada kasus dataset imbalance, dilakukan proses undersampling untuk menyeimbangkan kelas dalam dataset. Hasil klasifikasi menunjukkan bahwa aplikasi LinkAja mendapatkan sentimen negatif pada aspek produk dengan 98% dari total ulasan dan aspek tempat dengan 100% dari total ulasan, kemudian sentimen netral pada aspek harga sebesar 89% dari total ulasan, dan aspek promosi mendapatkan sentimen positif sebanyak 98% dari total ulasan.

During the COVID-19 pandemic that has hit the world since the end of 2019, transactions using the cashless payment method have experienced a significant increase. LinkAja, one of the digital wallet companies in Indonesia that serve cashless payments, needs to increase competitiveness amidst intense competition in the digital wallet business. One is to increase customer satisfaction by paying attention to various aspects based on the 4P marketing mix theory. The aspects used based on the marketing mix theory consist of several general elements called the 4Ps: product, price, place, and promotion. This study focuses on conducting aspect-based sentiment analysis to determine which aspects received positive, negative, or neutral ratings from the consumer review data. The research results can be used as a reference for LinkAja in determining which aspects need to be prioritized to improve the company's competitiveness. The data used in this study is Twitter data related to the @linkaja account mentioned for January 1, 2022, to May 17, 2022. This study performs aspect classification using string matching using the Thefuzz library. Then the sentiment classification is done using the SVM algorithm. In the case of dataset imbalance, an undersampling process is carried out to balance the classes in the dataset. The classification results show that the LinkAja application gets negative sentiment on the product aspect with 98% of the total reviews and the place aspect with 100% of the total reviews, then neutral sentiment on the price aspect with 89% of the total reviews, and the promotion aspect gets 98% positive sentiment of the total reviews."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Mochamad Reza Rahadi

Implementasi dan Evaluasi Metode Feature Engineering FinBERT untuk Prediksi Harga Saham Berbasis Sentimen Analisis Menggunakan BiLSTM = Implementation and Evaluation of FinBERT Feature Engineering Method for Sentiment Analysis-Based Stock Price Prediction Using BiLSTM

Meskipun teknologi telah mempengaruhi hampir semua aspek industri finansial, penelitian yang terfokus pada penggunaan teknologi pemrosesan teks dan analisis sentimen dalam konteks prediksi harga saham masih belum banyak dilakukan. Manfaat dan potensi dari penelitian semacam ini memiliki pengaruh yang tinggi, terutama karena analisis sentimen telah menjadi komponen yang penting dalam memprediksi tren pasar saham. Dalam penelitian ini, penulis mengusulkan penerapan metode feature engineering dalam memprediksi harga saham dengan memanfaatkan hasil analisis sentimen menggunakan FinBERT, lalu hasilnya akan dijadikan fitur oleh model BiLSTM. FinBERT adalah model berbasis BERT yang telah dilatih khusus untuk memproses dan menginterpretasi teks keuangan, sementara BiLSTM adalah arsitektur jaringan saraf berulang yang mampu mengatasi masalah yang ada pada jaringan saraf berulang standar seperti vanishing gradient dan efektif dalam mengolah data sekuensial. Penelitian ini menggabungkan kedua teknik ini untuk menciptakan model yang mampu memprediksi pergerakan harga saham berdasarkan analisis sentimen berita keuangan dengan nilai rata-rata MSE yang lebih rendah. Feature engineering digunakan dalam penelitian ini untuk mengekstrak dan mengolah informasi yang relevan dari dataset oleh model FinBERT untuk digunakan pada model BiLSTM. Dengan menggunakan metode feature engineering, ditemukan bahwa model BiLSTM yang menggunakan fitur sentimen analisis memiliki performa tertinggi dengan memiliki rata-rata nilai MSE terkecil dalam memprediksi tujuh saham yang memiliki karakteristik berbeda dengan nilai 3.43, nilai tersebut merupakan rata-rata terkecil dibandingkan tiga model lain dalam penelitian ini seperti LSTM dengan nilai MSE 4.04, Random Forest dengan nilai MSE 9.77, dan SVM dengan nilai MSE 12.56. Selanjutnya, proses optimisasi model BiLSTM menggunakan Optuna ditemukan nilai hyperparameter terbaik dalam menghadapi tujuh jenis saham yang berbeda, sehingga model mampu memprediksi lebih akurat dengan penurunan rata-rata nilai MSE hingga 40.55%. Sebagai bentuk validasi akhir pada penelitian ini telah dilakukan uji fold untuk mendapatkan model yang tidak overfitting dan memiliki rata-rata nilai MSE terkecil dengan variasi nilai hyperparameter batch size. Ditemukan batch size 16 merupakan ukuran paling optimal untuk tipe data NVDA,XOM dan TSLA dengan rata-rata MSE terkecil 0.64, 0.35, 0.05 sedangkan batch size 24 merupakan ukuran paling optimal untuk tipe data saham AAPL, AMZN, GOOG dan GOOGL dengan rata-rata MSE terkecil 0.028, 0.02, 0.03, 0.04, dan 0.03. Dalam menggunakan fitur sentimen analisis berhasil membuktikan menurunkan nilai MSE pada masing-masing jenis saham hingga rata-rata penurunan nilai MSE mencapai 33.10% dari semua jenis variasi data saham tanpa menggunakan fitur sentimen.

Although technology has influenced nearly all aspects of the financial industry, there is still a lack of research focusing on the use of text processing technology and sentiment analysis in the context of stock price prediction. The benefits and potential of such research are significant, especially as sentiment analysis has become a crucial component in predicting stock market trends. In this study, the authors propose the application of feature engineering to predict stock prices by utilizing sentiment analysis results using FinBERT, which are then used as features by the BiLSTM model. FinBERT is a BERT-based model specifically trained to process and interpret financial text, while BiLSTM is a recurrent neural network architecture capable of overcoming problems inherent in standard recurrent neural networks, such as the vanishing gradient, and is effective in processing sequential data. This study combines these two techniques to create a model capable of predicting stock price movements based on sentiment analysis of financial news with a lower average MSE value. Feature engineering is used in this study to extract and process relevant information from the dataset by the FinBERT model to be used in the BiLSTM model. By using feature engineering, it was found that the BiLSTM model using sentiment analysis features has the highest performance, having the lowest average MSE value in predicting seven stocks with different characteristics, with a value of 3.43, which is the smallest average compared to the three other models in this study, such as LSTM with an MSE value of 4.04, Random Forest with an MSE value of 9.77, and SVM with an MSE value of 12.56. Furthermore, the optimization process of the BiLSTM model using Optuna found the best hyperparameters in dealing with seven different types of stocks, enabling the model to predict more accurately with an average reduction in MSE value up to 40.55%. As a final form of validation in this study, a fold test was conducted to obtain a model that is not overfitting and has the smallest average MSE value with variations in hyperparameter batch size values. It was found that a batch size of 16 is the most optimal size for NVDA, XOM, and TSLA data types with the smallest average MSE of 0.64, 0.35, 0.05, while a batch size of 24 is the most optimal size for AAPL, AMZN, GOOG, and GOOGL stock data types with the smallest average MSE of 0.028, 0.02, 0.03, 0.04, and 0.03. Using sentiment analysis features proved to reduce the MSE value for each type of stock to an average reduction in MSE value reaching 33.10% from all types of stock data variations without using sentiment features
"

Depok: Fakultas Teknik Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Fathia Amira Nuramalia

Analisis Kinerja Bidirectional Long Short-Term Memory pada Analisis Sentimen Twitter Berbahasa Indonesia = Performance Analysis of Bidirectional Long Short-Term Memory on Twitter Sentiment Analysis in Indonesian Language

"Twitter adalah platform media sosial microblogging yang memungkinkan komunikasi dua arah untuk mengutarakan opini dan komentar. Komentar-komentar yang beragam ini dapat memperlihatkan sentimen-sentimen masyarakat apabila dilakukan analisis sentimen. Analisis sentimen adalah studi yang menganalisis opini orang terhadap suatu produk, organisasi, individu, atau jasa tertentu. Machine learning merupakan metode yang dapat mempermudah proses klasifikasi sentimen. Penelitian ini dilakukan pada cuitan berbahasa Indonesia terkait Kampus Merdeka yang diambil dari Twitter menggunakan package tweepy sebanyak 1.651 cuitan terhitung dari tanggal 5 Maret 2022 hingga 13 Maret 2022. Model machine learning yang digunakan pada penelitian ini adalah Bidirectional Long Short-Term Memory (BiLSTM), dengan dua model hybrid LSTM-based, yaitu CNN-LSTM dan LSTM-CNN sebagai pembanding. Kinerja model diukur dengan metrik kinerja accuracy, precision, recall, dan F1-score. Implementasi dilakukan pada data yang telah dilakukan oversampling untuk mendapatkan hasil yang optimal. Penelitian menunjukkan bahwa model BiLSTM memiliki kinerja yang lebih unggul dibandingkan dengan dua model pembanding lainnya pada seluruh metrik dengan besar metrik, yaitu: accuracy dan recall sebesar 79,577%; precision sebesar 73,097%; dan F1-score sebesar 75,634%.

Twitter is a microblogging social media platform that allows two-way communication to express opinion and comments. These various comments can show us sentiment of the public when we perform a sentiment analysis. Sentiment analysis is a study that analyze the opinion of people towards a specific product, organization, individual, or service. Machine learning is a method that will help perform sentiment classification easier. This study performs analysis on 1.651 data tweets about Kampus Merdeka taken from Twitter using a package called tweepy since March 5th 2022 until March 13th 2022. The machine learning model used in this study is Bidirectional Long Short-Term Memory (BiLSTM), with two LSTM-based hybrid model, CNN-LSTM and LSTM-CNN as comparison models. Model performance is measured by performance metrics accuracy, precision, recall, and F1-score. Implementation was done on data that has been going through oversampling to achieve the best result. The study shows that BiLSTM performs better than the other two comparison models for all the metrics with the percentage of the each metric being: 79.577% for accuracy and recall; 73,097% for precision; and 75,634% for F1-score."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian