Sentiment Analysis towards Full Movie Dirty Vote 2024 in X Using Support Vector Machine Method

.


Introduction
In the past, people expressed their opinions, criticism, and suggestions through print media, where not everyone had the ability to write and the opportunity to publish their writing.However, current developments in communication technology have changed people's habits in expressing their opinions and ideas.Social media has become a means of sharing information for people throughout the world (Nemes & Kiss, 2021;Li et al., 2022).These users write various opinions on many topics and discuss current issues.Film is an audio-visual communication medium (sound and image) which contains implied messages conveyed by the filmmaker to a group of people who watch it.The implied message in a film influences the audience's thoughts about the meaning of the film.By watching it, the audience seems to be brought along to follow the storyline so that it can influence the audience's perception to express their opinion about the plot, setting, characterization, and ending of the story (Tchernev et al., 2021;Ahmed & Lugovic, 2019).An opinion or opinion is a person's view of an issue.Opinions expressed by someone on the same thing can cause different judgments to emerge.The more rapid development of information technology, more and more people are writing their opinions about a product or service on social media (Alaimo et al., 2020;Xu One of the social media that is widely used by the public to convey their opinions is X.Currently, social media users (Mustaqlillah et al., 2023), especially Social media users in Indonesia currently reached 170 million users at the beginning of 2021, with the most favorite social media being The age group 16 -64 years reached a level of 63.6%, which corresponds to being eligible to be a voter in the general selection.Because there are many opinions expressed by Indonesian netizens, especially on social media X, this will produce various kinds of reactions.To find out and determine user X's tendency to post tweets, it is necessary to carry out sentiment analysis (Pratama et al., 2019;Naufal, 2023;Saputra, 2022).
In the context of social media, sentiment analysis is how to analyze people who express their opinions on various topics on social media (Nemes & Kiss, 2021).Sentiment analysis has been widely carried out, such as knowing consumer opinion responses to a product, including political preferences.
Based on the above, this sentiment analysis was created to find out tweets posted on X's timeline that contain positive and negative words regarding the Dirty Vote 2024 film.In this research, the method used to classify the Dirty Vote 2024 film is Support Vector Machine (SVM).The SVM method is one method that is widely used for data classification, especially text data.One of the advantages is that it can be implemented relatively easily, because the process of determining the support vector can be formulated in a QP problem.Sentiment analysis and opinion mining are fields of study that analyze someone's opinion, someone's sentiment, someone's evaluation, someone's attitude and someone's emotions into written language (Silitonga & Sihotang, 2019).

Methods
Thus, this study employs a quantitative method to analyse the sentiment of the "Dirty Vote 2024" movie through data acquired from the X platform.Support Vector Machine (SVM) algorithm is applied in the sentiment analysis process this because of their efficiency particularly for text classification to determine positive or negative sentiments. it can therefore be seen that the research process was properly laid out from the time the research topics objectives were defined.Specific emphasis was set on the identification of general public's attitude concerning the so-called 'Dirty Vote 2024' movie on the social networks.The main The dataset consisting of 1500 tweets was collected with the help of 'Tweepy' which is a Python package to access the Twitter API and focuses on the context of "Dirty Vote 2024", a movie.The collection process was performed over a fixed period to get ready data samples to capture sentiments during the movie's release and the discourse period.This is done through the removal of links, hashtags and any other special characters from the tweets in order to ensure high quality analysis is conducted.
Each tweet was then passed through a strict text Cleaning Process where the tweets were Tokenized, Stopword removed and the remaining words stemmed.Tokenization refer to the process of sylencing the text into the individual words or tokens while the stopword removal sought to remove unnecessary words not relevant in sentiment analysis such as 'and', 'the' and so on.Stemming was used to transform the words into stem words with similar stemmed used to standardize the text data.The sentiment analysis of each respective tweet was then done manually depending on the sentiment value of the tweet.Positive was said to be any tweet with a sentiment value of more than 0 and negative for a sentiment value of less than 0; this classification using sentiment dictionaries and/or domain specific lexicon relevant to the context of the movie.Following labelling, the next process that was incorporated in the textual data analysis was text mining whereby features were extracted from the preprocessed text.FEATURE EXTRACTION-In the current study, Term Frequency-Inverse Document Frequency (TF-IDF) was used to transform the text into numerical forms that the SVM could analyze.The method of TF-IDF helps to assign a certain weight to each word analyzing its frequency in a specific tweet and its rarity in the whole set of tweets.SVM was used due to its performance in analyzing high dimensional data as well as it's suitability to binary classification problems.A linear kernel was chosen for this work because the features of sentiment data are linear and, therefore, it is possible to clearly distinguish between positive and negative classes.The data was divided into the training set and the test set with 8:2 respectively that means 80% of the data we used for training the model and the remaining 20% for testing the performance of the model.To find out the decision boundary that defines the two sentiment classes, the training set was employed while the testing data was used in evaluating the accuracy, precision, recall, and the F1-measure of the model.
As for the assessment of the SVM model under test, the table above cross-tabulated the projection of the sentiments with the true labels and thus created a confusion matrix with which the performance of the model was measured.The evaluation of the model's performance involved the use of basic measuresnamely accuracy, precision, recall, and F1score in the classification of tweets sentiment.These metrics enabled a fairly comprehensive assessment, and the required balance between precision and the number of true reconstructions was also achieved.The whole research was done for one month, and all computational calculations were made in the Python environment.Thus, the research location was a specific X platform due to its relevance to the social media as the arena of public opinion.

SVM Classification
After the data has been cleaned and structured, the next step is to carry out classification using the Support Vector Machine (SVM) algorithm.The first stage of the classification process is dividing the data into training data and test data.In this research, a ratio of 8:2 is used for comparison of training data and test data.Training data is used to study the characteristics and differences between positive and negative classes, while test data is used to evaluate the percentage of correct classification success.
The results of linear kernel calculations from the sample data are shown in the following table: Example of calculation for column 1 row 1.
After getting the kernel value, the next step is to calculate the Hessian matrix.Before calculating the Hessian matrix, several parameters will be determined, including αi, C, γ, λ, and maximum iteration.The following is a description of the parameters that will be used in the Hessian matrix calculation stage.(x,y)=x⋅y between a given pair of data points.The columns and rows in the table represent different samples, with the values calculated between two samples placed in the corresponding cells.For example, the value in row 1 column 1 is 0.042, which shows the result of calculating the linear kernel between sample 1 and itself.
This kernel calculation is important because it helps the SVM map the data into a higherdimensional space, where the data can be separated more easily by hyperplanes.For example, the calculation in column 1 row 1 is done by multiplying the elements of the first sample's feature vector by itself, which results in 0.206×0.206=0.042.
This calculation is done for all pairs of samples, and the results are arranged in a table to facilitate the next step in the SVM algorithm, which is the calculation of the Hessian matrix.
Once the kernel values are obtained, the next step is to calculate the Hessian matrix.To do this, several parameters such as, C, γ, λ and maximum iterations need to be determined first.These parameters will affect how the Hessian matrix is calculated and how the SVM model is optimized to find a decision boundary that effectively separates the classes in the dataset.Table 4.17 provides a transparent and structured basis for understanding and reproducing this process, which is key in scientific research and practical applications of SVM.
Table 3. Parameter Values

Hessian Matrix Calculation
The steps to calculate the Hessian matrix begin by starting with the value α = 0, then carrying out calculations using the following equation.
Then, from the δα results obtained, a calculation is carried out for the αi value with the following equation.Next, the total weight of the test data is used to calculate the function value f(x).By using the hyperplane that has been obtained, the data will produce a positive value if wx + b = 1, while a negative value will result if wx + b = -1.
If the data produces a positive value, the data will be classified as a positive class.Conversely, if the data produces negative values, the data will be classified as a negative class.'crowd', 'good', 'movie', 'accept', 'cheat', 'discussion', 'accept', 'flow', 'fund', 'movie'] f( x After carrying out tests on the two test data, it was found that in the data classification function both test data obtained a value of 1, so that both were classified as class 1 category which is the Agree class.80% Support vector machine predicts positive or negative classes by analyzing data and exploring the information contained in the training data.This training data includes positive and negative sentiment classes, and SVM will understand the patterns and characteristics of words related to these two classes.Below is a visual representation of words that commonly appear in positive and negative sentiment, called a word cloud.Describes the initial step in the classification process using the Support Vector Machine (SVM) algorithm, where the dataset is divided into two parts: a training set and a testing set, with a ratio of 8:2.
This means that 80% of the data is used to train the model, while the remaining 20% is used to test the model.This division allows the training set to capture the underlying patterns and features that distinguish the positive and negative classes thoroughly.Thus, this extensive training phase helps the SVM algorithm learn the optimal decision boundary that can effectively separate the two classes.Keeping 20% of the data for the testing set aims to evaluate the performance and accuracy of the model on new data that has never been seen before.This assessment is important because it measures the ability of the SVM model to generalize beyond the data used for training.Evaluation on the testing set provides insight into the robustness and predictive power of the model, ensuring that the learned decision boundary is not just a result of overfitting to the training data, but also a reliable classifier for future events.
After testing the Support Vector Machine algorithm is complete, results will be obtained in the form of test data labels produced by the model during the training process.The results of the test data classification which shows the sentiment class of the program will be compared with the original class, so that the accuracy, precision, recall and f1-score values of the model used on the dataset can be known.Overall accuracy, precision, recall and F1-score values, you can use the classification report.The following is a classification report and confusion matrix from the results of sentiment analysis carried out using the Support Vector Machine algorithm.
ISSN: 2716-3865 (Print), 2721-1290 (Online) Copyright © 2024, Journal La Multiapp, Under the license CC BY-SA 4.0 The below figure represents the classification report extracted from Support Vector Machine (SVM model) used for the sentiment analysis of the tweets regarding the "Dirty Vote 2024" movie.The outcomes of the report revealed that, the model was able to classify 86% of the tweets right into the positive or negative sentiment baskets.Precision is at 86% which give the proportion that when the model classified the tweet as positive, then it was accurate 86% of the time.The recall score, that measures the model's capacity to identify all actual positive items of the model, stands at a solid 100% indicating that the model did not miss any actual positive tweet.The F1-score, which takes both precision and recall into account, is 93% and therefore reflects a well-working model of this error is well balanced and serves well in policy decision-making.Overall, the performance is marked but the precision reveals the areas of potential improvement, namely the amount of false positive results which may be due to distinguishing between the shades of meaning in the posts shared on social media.This report elaborates upon the success of the model proposed though, also establishes the work that remains to be done in the realm of sentiment analysis of such free-wheeling and constantly evolving social media text data.In Figure 4.19, there are accuracy, precision, recall and f1-score values .Accuracy is a metric that measures how well a classification model is at predicting the correct class of data.Precision is a metric that measures how well a classification model is at predicting the correct class of positives.Recall is a metric that measures how well a classification model is at identifying true positive data.F1-score is a metric that balances between precision and recall.
From table 4.29, the values for accuracy, precision and recall can be calculated and f1-score using the equation below.

Application
The method applied to classify data in this research is Support Vector Machine (SVM).The aim of this research is to measure the level of accuracy of the system being developed and also find out classification of sentiment analysis in the Dirty Vote 2024 film based on public opinion in.
That is why the classification accuracy of 86% may not be a good criterion for determining the frequency of making correct predictions, since similarly to the first experiment, this metric does not reflect the model's reliability in real-life scenarios.The said misclassification rate of about 14% could be critically important in certain applications which have focused on sentiment analysis to make some key decisions such as political campaigns or brand management (Yadav et al., 2023).The precision of the model is also equally impressive at 86% which means that the model is right when it assigns positive sentiments.The false positive rate of 14% poses questions regarding the model's ability to generalise its findings, mainly its specificity; which is its ability to correctly differentiate between different expressions of sentiment, including sarcasm, irony, and other contextual phrasesfactors that pose significant problems in natural language processing for computers (Hosseini et al. , 2022).
The lack of precision is problematic and shows a larger problem of sentiment analysis of platforms like X due to short and informal text.As described in some of the recent papers, even the state-of-art models fail in such situations, especially when there is the feature of ambiguity and they use only syntactic level without semantic and contextual meanings (Goh, Diesner, Wong, Croman, & Zha, 2023;Baeldung, 2023).These findings imply requiring evolving to superior forms of NLP algorithms, including contextual embeddings or hybrid models that consist of both rule-based and machine learning (Sun et al., 2023;Appinventiv, 2023).
The figure 100% which represents the perfect recall shows that the model managed to correctly classify all the positive sentiment in the data set, which is impressive.But this needs a reminder of the risk of overfitting, where the model has been trained explicitly to learn from characteristics of the training data leading to reduced ability to generalize data from unseen datasets (Cresswell, 2024;Wang et al., 2023).This is a common problem oftentimes observed in sentiment analysis scenarios, where actual high recall on training data sets may result in poor performance when deployed to another data set or even more so another domain, especially where there are differences in the language used (Wang et al., 2023;Artificial Intelligence Review, 2023).Consequently, 93% of F1-score indicates that the model is capable of achieving a good balance of precision and recall in sentiment classification since the trade-offs are well described in the sentiment classification problem.Nevertheless, this metric should be used with a degree of reverence, especially when there is an understanding that many steps during data preprocessing, including tokenization and stopword removal, may introduce certain biases into the analysis as they hide the actual context of NL (Luo et al., 2022;Artificial Intelligence Review, 2023).
Employment of SVM in this study shows the continued contemporary use of the classifier in practical sentiments analysis, more especially where the focus is on interpretability and computational complexity.However, it is of appropriate emphasis to put these observations into perspective with the general theory systems that undergird sentiment analysis.For example, some of the recent research directions in NLP point toward contextual factors, such as user metadata or temporal or network-based features, which greatly improve the model's sentiment beyond the words (Liu et al., 2022).Although the current study has effectively implemented and applied SVM, it could have incorporated these updates, especially in the prospects concerning short text of social media.However, the use of SVM which is prone to the algorithm has its merit maybe a constraint to address the issue of capturing other more complex and non-linear patterns in the data.Recent studies have shown that deep learning models such as transformers and recurrent neural networks perform better than the traditional machine learning algorithms like Support Vector Machines in situations where tasks include sentiment analysis at scale on diverse types of data (Chen & Zhou, 2023;Appinventiv, 2023).These models are more capable when it comes to the matters of understanding context dependencies and other long-range dependencies which are so desirable for proper classification of sentiments (Yang et al., 2023).

Conclusion
It shows how accurate SVM is as a tool since its F1-score proved that it is very efficient in the classification of sentiment.the precision metric, which points to a 14% rate of false positives, reveals a limitation: the model's ability to deal with weakly supervised data, such as sarcasm, irony or any other complex expression of sentiment.This is a typical drawback of sentiment analysis as the simplicity of the of binary choice of classes does not encompass functionality of emotions, rationally or irrationally, at the background of social media interactions.implication is that although SVM excels in numerous cases tomorrow's tasks may not always be best solved by it.This overfitting not only threatens the model's generalizability to new datasets but also highlights a critical vulnerability in the current approach: the training data bias, whereby the model may excel well as a performant machine learning when tested on training data but may fail to work well when tested on different or more complex real world data.This is a big drawback when it comes to applied science since the models we wish to replicate have not only to be precise but also to be able to generalize from sample to population.It expands the knowledge of the field regarding developing sentiment analysis approaches in real-world applications regarding how fast they can be implemented and how easily they can be explained.However, they also highlight the requires to include improved NLP tools in analyzing sentiment as a result of social media posts.In future studies, more emphasis should be placed on the integration of the context-aware embedding which utilizes semantic and syntactic features and also the hybrid models that would further merge the feasibility of the simple and rigid model such as the SVM with the complexity and depth of deep learning.Such developments might lead to an improvement in accuracy of the sentiment analysis while at the same time not compromising on the robustness of the model for a given set of data.

Table 2 .
Linear Kernel Calculation Table 2 contains the linear kernel calculations for the Support Vector Machine (SVM) algorithm.Each value in this table shows the result of calculating the linear kernel function K ISSN: 2716-3865 (Print), 2721-1290 (Online) Copyright © 2024, Journal La Multiapp, Under the license CC BY-SA 4.0

Table 4 .
Hessian Matrix Calculation Results ISSN: 2716-3865 (Print), 2721-1290 (Online) Copyright © 2024, Journal La Multiapp, Under the license CC BY-SA 4.0In the initial calculation, the iteration starts from iteration 0. Because the initial α value is still 0. The results of the squential training calculation are as shown in the table as follows.

Table 7 .
Calculation Results of δαi in iteration 1The next process is to calculate the kernel from each test data with the training data that has been processed previously.The following are the results of kernel calculations between training data and test data.

Table 9 .
Term Weighting in Test Data