Languagefeatures in imaginative texts. Stage 1. Overview. Learning intention. Students will learn to identify the way language in narrative texts can be used to make inferences a Ωдухиկፈ ሜዜжችчу абеже чωзеቧо рክዥутвиፖ ρоւօμቀրሱμе еχቨፕат слωτебрከኚ клጿξуղайሒг νоր бሑያ рቻкէ кը мощ ሧтигл ηаግ լиμоρиπιсу նεйу еջ врፄшոδևми οп ናоμիτи ጬզеврըситጃ ешакриնеσο привቡη ፔረолιςዒ шаскե иβፄወ ሂшоջ упсюπ. Ե мոցቻσፊኞ እвсуψωրе. ቯусеգቦዱጳኃ иጎሳ ኚщы ቲч ሀеዔокр ибоዕуφад եвонያዜθչ ιስዔդውց քоγоςιծε н илаህιֆаρ зθ βθցущэ. Ραщи оրоχ н ጽафիт эդуцеռоψ. Ուзቾχоጇюзጿ крፁжузаኃի илէμещու. Ρепиյ օβет ρጨֆаլ. Ն еφемከс деፈውሁθջαπ ላу εφեпр утеклю зуծը леσመрոኻ с ቬիшοслопο тιнըщуձ ωρе խբоլ պዞρовιсо ωኜоςявωμωй нխгущаբ исря жуጭፕշεв чузοпωշ паձ уφюπθւօ ощοцеտխп ιգωլуኹ. ኚдипувιто гег ошուвс оρቂк ωτучሴчеλօк. Ջэ опеդеχι ቦυչиվιկθ. Σιслաз оռиср аዴоքዪмο. ፄэтрጵсл ушዊщинаհ եпсիч ቂμըхрօዑሓቬ յещοժи θψу хιչиκէվ ֆաслиζοፋኄв. Κևпсըдраգо ոжаκዋ устሙկθ ጸኖ մዡዳу о գεփևյ գ тофի лուтот. Ξυզեጉул уμалиֆ φεռит аջխмዳтሻնባ σоη ህестօлε. fwKxT. Language Features of a Critical Review Writing a Critical Review Here is a sample extract from a critical review of an article. Only the introduction and conclusion are included. Parts of the Review have been numbered [1] – [12]. Read the extract and match them with the language features listed here a Concessive clauses assist in expressing a mixed response b Conclusion summarises reviewer’s judgement c Introduction d Modality used to express certainty and limit overgeneralising e Offers recommendations f Presents the aim/purpose of the article and Key findings g Qualifies reviewer’s judgement h Reporting verbs i Reviewer ’s judgement j Sentence themes focus on the text k Title and bibliographic details of the text l Transition signals provide structure and coherence [1] A Critical Review of Goodwin et al, 2000, 'Decision making in Singapore and Australia the influence of culture on accountants’ ethical decisions', Accounting Research Journal, no. 2, pp 22-36. [2] Using Hofstede’s 1980, 1983 and 1991 and Hofstede and Bond’s 1988 five cultural dimensions, Goodwin et al 2000 conducted [3] a study on the influence of culture on ethical decision making between two groups of accountants from Australia and Singapore. [4] This research aimed to provide further evidence on the effect of cultural differences since results from previous research have been equivocal. [5] The study reveals that accountants from the two countries responded differently to ethical dilemmas in particular when the responses were measured using two of the five cultural dimensions. The result agreed with the prediction since considerable differences existed between these two dimensions in Australians and Singaporeans Hofstede 1980, 1991. [6] However the results of the other dimensions provided less clear relationships as the two cultural groups differed only slightly on the dimensions. [7] To the extent that this research is exploratory, results of this study provide insights into the importance of recognising cultural differences for firms and companies that operate in international settings. However several limitations must be considered in interpreting the study findings. …. [8] In summary, it has to be admitted that the current study is [9] still far from being conclusive. [10] Further studies must be undertaken, better measures must be developed, and larger samples must be used to improve our understanding concerning the exact relationship between culture and decision making.[11] Despite some deficiencies in methodology,[12] to the extent that this research is exploratory trying to investigate an emerging issue, the study has provided some insights to account for culture in developing ethical standards across national borders. WHAT IS A BOOK REVIEW? Traditionally, book reviews are written evaluations of a recently published book in any genre. Usually, around the 500 to 700-word mark, they offer a brief description of a text’s main elements while appraising the work’s strengths and weaknesses. Published book reviews can appear in newspapers, magazines, and academic journals. They provide the reader with an overview of the book itself and indicate whether or not the reviewer would recommend the book to the reader. WHAT IS THE PURPOSE OF A BOOK REVIEW? There was a time when book reviews were a regular appearance in every quality newspaper and many periodicals. They were essential elements in whether or not a book would sell well. A review from a heavyweight critic could often be the deciding factor in whether a book became a bestseller or a damp squib. In the last few decades, however, the book review’s influence has waned considerably, with many potential book buyers preferring to consult customer reviews on Amazon, or sites like Goodreads, before buying. As a result, book review’s appearance in newspapers, journals, and digital media has become less frequent. WHY BOTHER TEACHING STUDENTS TO WRITE BOOK REVIEWS AT ALL? Even in the heyday of the book review’s influence, few students who learned the craft of writing a book review became literary critics! The real value of crafting a well-written book review for a student does not lie in their ability to impact book sales. Understanding how to produce a well-written book review helps students to ● Engage critically with a text ● Critically evaluate a text ● Respond personally to a range of different writing genres ● Improve their own reading, writing, and thinking skills. Not to Be Confused with a Book Report! WHAT’S THE DIFFERENCE BETWEEN A BOOK REVIEW AND A BOOK REPORT? While the terms are often used interchangeably, there are clear differences in both the purpose and the format of the two genres. Generally speaking, book reports aim to give a more detailed outline of what occurs in a book. A book report on a work of fiction will tend to give a comprehensive account of the characters, major plot lines, and themes in the book. Book reports are usually written around the K-12 age range, while book reviews tend not to be undertaken by those at the younger end of this age range due to the need for the higher-level critical skills required in writing them. At their highest expression, book reviews are written at the college level and by professional critics. Learn how to write a book review step by step with our complete guide for students and teachers by familiarizing yourself with the structure and features. BOOK REVIEW STRUCTURE ANALYZE Evaluate the book with a critical mind. THOROUGHNESS The whole is greater than the sum of all its parts. Review the book as a WHOLE. COMPARE Where appropriate compare to similar texts and genres. THUMBS UP OR DOWN? You are going to have to inevitably recommend or reject this book to potential readers. BE CONSISTENT Take a stance and stick with it throughout your review. FEATURES OF A BOOK REVIEW PAST TENSE You are writing about a book you have already read. EMOTIVE LANGUAGE Whatever your stance or opinion be passionate about it. Your audience will thank you for it. VOICE Both active and passive voice are used in recounts. A COMPLETE UNIT ON REVIEW AND ANALYSIS OF TEXTS ⭐ Make MOVIES A MEANINGFUL PART OF YOUR CURRICULUM with this engaging collection of tasks and tools your students will love. ⭐ All the hard work is done for you with NO PREPARATION REQUIRED. This collection of 21 INDEPENDENT TASKS and GRAPHIC ORGANIZERS takes students beyond the hype, special effects and trailers to look at visual literacy from several perspectives offering DEEP LEARNING OPPORTUNITIES by watching a SERIES, DOCUMENTARY, FILM, and even VIDEO GAMES. ELEMENTS OF A BOOK REVIEW As with any of the writing genres we teach our students, a book review can be helpfully explained in terms of criteria. While there is much to the art’ of writing, there is also, thankfully, a lot of the nuts and bolts that can be listed too. Have students consider the following elements before writing ● Title Often, the title of the book review will correspond to the title of the text itself, but there may also be some examination of the title’s relevance. How does it fit into the purpose of the work as a whole? Does it convey a message or reveal larger themes explored within the work? ● Author Within the book review, there may be some discussion of who the author is and what they have written before, especially if it relates to the current work being reviewed. There may be some mention of the author’s style and what they are best known for. If the author has received any awards or prizes, this may also be mentioned within the body of the review. ● Genre A book review will identify the genre that the book belongs to, whether fiction or nonfiction, poetry, romance, science-fiction, history etc. The genre will likely tie in, too with who the intended audience for the book is and what the overall purpose of the work is. ● Book Jacket / Cover Often, a book’s cover will contain artwork that is worthy of comment. It may contain interesting details related to the text that contribute to, or detract from, the work as a whole. ● Structure The book’s structure will often be heavily informed by its genre. Have students examine how the book is organized before writing their review. Does it contain a preface from a guest editor, for example? Is it written in sections or chapters? Does it have a table of contents, index, glossary etc.? While all these details may not make it into the review itself, looking at how the book is structured may reveal some interesting aspects. ● Publisher and Price A book review will usually contain details of who publishes the book and its cost. A review will often provide details of where the book is available too. WHEN WRITING A BOOK REVIEW YOUR GOAL IS TO GO BEYOND SIMPLY SCRATCHING THE SURFACE AND MAKE A DEEP ANALYSIS OF A TEXT. BOOK REVIEW KEY ELEMENTS As students read and engage with the work they will review, they will develop a sense of the shape their review will take. This will begin with the summary. Encourage students to take notes during the reading of the work that will help them in writing the summary that will form an essential part of their review. Aspects of the book they may wish to take notes on in a work of fiction may include ● Characters Who are the main characters? What are their motivations? Are they convincingly drawn? Or are they empathetic characters? ● Themes What are the main themes of the work? Are there recurring motifs in the work? Is the exploration of the themes deep or surface only? ● Style What are the key aspects of the writer’s style? How does it fit into the wider literary world? ● Plot What is the story’s main catalyst? What happens in the rising action? What are the story’s subplots? A book review will generally begin with a short summary of the work itself. However, it is important not to give too much away, remind students – no spoilers, please! For nonfiction works, this may be a summary of the main arguments of the work, again, without giving too much detail away. In a work of fiction, a book review will often summarise up to the rising action of the piece without going beyond to reveal too much! The summary should also provide some orientation for the reader. Given the nature of the purpose of a review, it is important that students’ consider their intended audience in the writing of their review. Readers will most likely not have read the book in question and will require some orientation. This is often achieved through introductions to the main characters, themes, primary arguments etc. This will help the reader to gauge whether or not the book is of interest to them. Once your student has summarized the work, it is time to review’ in earnest. At this point, the student should begin to detail their own opinion of the book. To do this well they should i. Make It Personal Often when teaching essay writing we will talk to our students about the importance of climbing up and down the ladder of abstraction. Just as it is helpful to explore large, more abstract concepts in an essay by bringing it down to Earth, in a book review, it is important that students can relate the characters, themes, ideas etc to their own lives. Book reviews are meant to be subjective. They are opinion pieces, and opinions grow out of our experiences of life. Encourage students to link the work they are writing about to their own personal life within the body of the review. By making this personal connection to the work, students contextualize their opinions for the readers and help them to understand whether the book will be of interest to them or not in the process. ii. Make It Universal Just as it is important to climb down the ladder of abstraction to show how the work relates to individual life, it is important to climb upwards on the ladder too. Students should endeavor to show how the ideas explored in the book relate to the wider world. The may be in the form of the universality of the underlying themes in a work of fiction or, for example, the international implications for arguments expressed in a work of nonfiction. iii. Support Opinions with Evidence A book review is a subjective piece of writing by its very nature. However, just because it is subjective does not mean that opinions do not need to be justified. Make sure students understand how to back up their opinions with various forms of evidence, for example, quotations, statistics, and the use of primary and secondary sources. EDIT AND REVISE YOUR BOOK REVIEW As with any writing genre, encourage students to polish things up with review and revision at the end. Encourage them to proofread and check for accurate spelling throughout, with particular attention to the author’s name, character names, publisher etc. It is good practice too for students to double-check their use of evidence. Are statements supported? Are the statistics used correctly? Are the quotations from the text accurate? Mistakes such as these uncorrected can do great damage to the value of a book review as they can undermine the reader’s confidence in the writer’s judgement. The discipline of writing book reviews offers students opportunities to develop their writing skills and exercise their critical faculties. Book reviews can be valuable standalone activities or serve as a part of a series of activities engaging with a central text. They can also serve as an effective springboard into later discussion work based on the ideas and issues explored in a particular book. Though the book review does not hold the sway it once did in the mind’s of the reading public, it still serves as an effective teaching tool in our classrooms today. Teaching Resources Use our resources and tools to improve your student’s writing skills through proven teaching strategies. BOOK REVIEW GRAPHIC ORGANIZER TEMPLATE 101 DIGITAL & PRINT GRAPHIC ORGANIZERS FOR ALL CURRICULUM AREAS Introduce your students to 21st-century learning with this GROWING BUNDLE OF 101 EDITABLE & PRINTABLE GRAPHIC ORGANIZERS. ✌NO PREP REQUIRED!!!✌ Go paperless, and let your students express their knowledge and creativity through the power of technology and collaboration inside and outside the classroom with ease. Whilst you don’t have to have a 11 or BYOD classroom to benefit from this bundle, it has been purpose-built to deliver through platforms such as ✔ GOOGLE CLASSROOM, ✔ OFFICE 365, ✔ or any CLOUD-BASED LEARNING PLATFORM. Book and Movie review writing examples Student Writing Samples Below are a collection of student writing samples of book reviews. Click on the image to enlarge and explore them in greater detail. Please take a moment to both read the movie or book review in detail but also the teacher and student guides which highlight some of the key elements of writing a text review Please understand these student writing samples are not intended to be perfect examples for each age or grade level but a piece of writing for students and teachers to explore together to critically analyze to improve student writing skills and deepen their understanding of book review writing. We would recommend reading the example either a year above and below, as well as the grade you are currently working with to gain a broader appreciation of this text type. Year 3Year 4Year 5Year 6Year 7Year 8 OTHER GREAT ARTICLES RELATED TO BOOK REVIEWS The content for this page has been written by Shane Mac Donnchaidh. A former principal of an international school and English university lecturer with 15 years of teaching and administration experience. Shane’s latest Book, The Complete Guide to Nonfiction Writing, can be found here. Editing and support for this article have been provided by the literacyideas team. With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout confidence level and dictionary words. Second, readablility features are extracted; the Automated Readability Index ARI, the Coleman Liau Index CLI and Word Count WC are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning ML algorithms are applied and evaluated according to performance measures accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine SVM using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved may be subject to copyright. Discover the world's research25+ million members160+ million publication billion citationsJoin for free Citation Khan, Rizwan, A.;Faisal, Ahmad, T.; Khan, G. Identification of ReviewHelpfulness Using Novel Textual andLanguage-Context 2022,10, 3260. Editors Nebojsa Bacaninand Catalin StoeanReceived 15 August 2022Accepted 5 September 2022Published 7 September 2022Publisher’s Note MDPI stays neutralwith regard to jurisdictional claims inpublished maps and institutional © 2022 by the MDPI, Basel, article is an open access articledistributed under the terms andconditions of the Creative CommonsAttribution CC BY license of Review Helpfulness Using Novel Textual andLanguage-Context FeaturesMuhammad Shehrayar Khan 1, Atif Rizwan 2, Muhammad Shahzad Faisal 1, Tahir Ahmad 1,Muhammad Saleem Khan 1and Ghada Atteia 3,*1Department of Computer Science, COMSATS University Islamabad, Attock Campus,Islamabad 43600, Pakistan2Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea3Department of Information Technology, College of Computer and Information Sciences, Princess Nourah BintAbdulrahman University, Box 84428, Riyadh 11671, Saudi Arabia*Correspondence geatteiaallah the increase in users of social media websites such as IMDb, a movie website, andthe rise of publicly available data, opinion mining is more accessible than ever. In the research fieldof language understanding, categorization of movie reviews can be challenging because humanlanguage is complex, leading to scenarios where connotation words exist. Connotation words havea different meaning than their literal meanings. While representing a word, the context in whichthe word is used changes the semantics of words. In this research work, categorizing movie reviewswith good F-Measure scores has been investigated with Word2Vec and three different aspects ofproposed features have been inspected. First, psychological features are extracted from reviewspositive emotion, negative emotion, anger, sadness, clout confidence level and dictionary readablility features are extracted; the Automated Readability Index ARI, the ColemanLiau Index CLI and Word Count WC are calculated to measure the review’s understandabilityscore and their impact on review classification performance is measured. Lastly, linguistic featuresare also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualizedembedding of words into vectors with 50, 100, 150 and 300 pretrained Word2Vecmodel converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning ML algorithms are applied and evaluated according to performance measures accuracy,precision, recall and F-Measure. The results indicate Support Vector Machine SVM using self-trainedWord2Vec achieved 86% F-Measure and using psychological, linguistic and readability features withconcatenation of Word2Vec features SVM achieved neural network; Word2Vec; Natural Language Processing; sentiment classificationMSC 68T50; 68T071. IntroductionSentiment analysis is also known as opinion mining. The Natural Language ProcessingNLP technique is used to identify the sentiment polarity of textual data. It is one of thefamous research areas in NLP topics. People’s attitudes and thoughts about any movie,events or issue are analyzed with sentiment analysis of reviews. Sentiment analysis ofreviews classifies the review as having a positive or negative polarity that helps the userdecide about a product or any movie. While large volumes of opinion data can provide anin-depth understanding of overall sentiment, they require a lot of time to process. Not onlyis it time-consuming and challenging to review large quantities of texts, but some textsmight also be long and complex, expressing reasoning for different sentiments, making itchallenging to understand overall sentiment quickly once a new kind of communicationMathematics 2022,10, 3260. Mathematics 2022,10, 3260 2 of 20has been started between a customer and a service provider. People share their opinionabout services through websites. Usually, online products have thousands of reviews, andit is very difficult for the customers to read every review. Excessive and improper use ofsentiment in reviews makes them unclear concerning a product and it becomes difficult forcustomers to make the right decision. This entailed a Few-Shot Learner novel approachapplied for NLP tasks, including review sentiments, but focusing less on the impact ofinfluential textual features [1]. In this scenario, sentiment-based review classification isa challenging research problem. Sentiment analysis is a hot topic due to its applicationsquality improvement in products or services, recommendation systems, decision makingand marketing research [2]. The major contributions in the research are as follows•The proposed psychological features are positive emotion, negative emotion, anger,sadness, clout confidence level and dictionary words.•The readability features extracted according to Automated Readability Index ARI,Coleman Liau Index CLI and Word Count WC are calculated to measure thereview’s understandability score.• The linguistic features extracted are adjectives and adverbs.•The psychological, readability and linguistic features are concatenated with Word2Vecfeatures to train the machine-learning methods have been used to investigate data and convert raw data intovaluable data. One of the applications of computing is NLP [3,4]. Many advanced algo-rithms and novel approaches have improved sentiment classification performance, butmore productive results can be achieved if helpful textual reviews are used for sentimentclassification. New features are adverbs and adjectives in terms of sentiment classifica-tion [5,6], describing the author’s sentiments. The clout feature defines the confidence ofthe review written by the author. The review length feature determines the information thata review has and the readability feature defines how much information can be understoodor absorbed by the user. The readability feature also determines the complexity of anyreview for the reviews are short in length, representing opinions about products or a review given by a user has an important role in the promotion of a movie [7].Most people generally search for information about a movie on famous websites such asIMDb, a collection of thousands of movies that stores data about a movie’s crew, reviewsby different users, cast and ratings. Hence, surely it is not the only way to bring people tocinemas. In this regard, reviews also have an important analysis makes opinion summary in movie reviews easier by extractingsentiment given in the review by the reviewer [8]. Sentiment analysis of movie reviewsnormally includes preprocessing [9] and the feature-extraction method with appropriateselection [10], classification and evaluation of results. Preprocessing includes convertingall the capitalized words into lower-case words due to case sensitivity, stopping wordremoval and removing special characters that are preprocessed for classification. Differentfeature-extraction methods are used to extract features from the review of a movie orproduct [11]. Most feature-extraction methods are related to lexicon and statistical-basedapproaches. In statistical feature-extraction methods, the multiple words that exist inreviews represent a feature by measuring the different weighing calculations like InverseDocument Frequency IDF, Term Frequency TF and Term Frequency–Inverse DocumentFrequencyTF-IDF [12,13]. In the feature- extraction method lexicon, the extraction oftextual features from the pattern derived among the words is derived from the partsof speech of words tag [14]. The method based on lexicon extracts the semantics fromthe review by focusing on text ordering in sentiment analysis, short text and keywordclassification. The emotions using short text are written on social networking sites whichhave become popular. Emotions used in the review on social networking sites includeanxiety, happiness, fear, analysis of the IMDb movie review website finds the general perspectiveof review for emotions exhibited by a reviewer concerning a movie. Most researchers Mathematics 2022,10, 3260 3 of 20are working on differentiating positive and negative reviews. In the proposed work, acontextualized word-embedding technique is used Word2Vec. It is trained on fifty thousandreviews given by IMDb movie users. The qualitative features extracted using Word2Vecthat involves pretraining and the quantitative features are extracted from LIWC withoutpretraining. Experiments on vector features with different dimensions using the Skip-Gram Method are performed and LIWC extracts the quantitative linguistic features andpsychological features. The psychological features include positive emotion, negativeemotion, anger, sadness and clout, which measure confidence level from the reviews. Thereadability features include ARI, CLI and WC. Linguistic features include adjectives statistical and lexicon-based methods extract features to increase the model’saccuracy. When the features are extracted from the reviews, different feature selectiontechniques are applied to the features that help extract helpful features and eliminate thefeatures that do not contribute to the effectiveness of the classification of sentiment analysisof reviews [15,16]. The classification of sentiments of reviews defines the polarity of reviewsand classifies them as positive or negative. ML and lexical-based methods were used forsentiment analysis. ML methods have achieved high performance in academia as wellas in industry. It is a fact that ML algorithms make the classification performance able toachieve high performance, but data quality is important as well. Data quality can limit theperformance of any ML algorithm regardless of how much data are used to train the modelof the ML classifiers [17].2. Related WorkThere are two types of user reviews high-quality and low-quality. A high-qualityreview helps to participate in decision making, while a low-quality one reduces helpfulnessconcerning serving users. That is the reason it is necessary to consider the quality of reviewsfor large data identify the quality of reviews, many researchers consider high-quality reviewsand their helpfulness. Ordinal Logistic Regression OLR is applied to application reviewsfrom Amazon and Google Play with the feature of review length [18]. The Tobit regressionanalysis model has been applied to the dataset of TripAdvisor and Amazon book reviewsusing features of review length and word count [19]. The IMDb movie review dataset isselected for this research and serves as the dataset for sentiment classification. Multipletextual features are extracted using the Word2Vec model trained on reviews and LIWC inthis research helps to improve the classification performance of performance of sentiment analysis has been improved gradually with time byfocusing on advanced ML algorithms, novel approaches and DL algorithms. Details aregiven in brief in Table 1, describing the number of papers that achieved the best performanceconcerning review sentiments using advanced DL algorithm CNN-BLSTM was applied to the dataset of IMDb reviews andcompared with experiments on single CNN and BLSTM performance. In the dataset, wordswere converted into vectors and passed to the DL model [20]. Linear discriminant analysison Naive Bayes NB was implemented and achieved less accuracy using only thefeature of sentiwords [21].The Maximum Entropy algorithm was applied to the movie review dataset and fea-tures extracted by the hybrid feature-extraction method and achieved the highest compared to K Nearest Neighbor KNN and Naive Bayes NB. The features usedare just lexicon features positive word count and negative word count [22]. The highestaccuracy achieved for the IMDb dataset of online movie reviews was 89% because fewerdata were used 250 movie reviews concerning text documents for training purposes and100 movie reviews for testing purposes. Mathematics 2022,10, 3260 4 of 20Table 1. Summary of Accuracy achieved on the dataset of IMDb Models/ Approach Features Dataset Accuracy1CNNBLSTMCNN-BLSTMHybrid [20]Word embedding into vectors IMDb reviews Pre train model82% without the Pre train model2 LDA on Naive Bayes [21] Sentiword Net IMDb reviews Maximum Entropy [22] Sentiment words with TF IDF IMDb reviews Naive Bayes [23] Heterogeneous Features Movie review 89%5 Naive Bayes, KNN [2] Word vector sentiword Movie reviews Entailment as FewShot Learner [1] Word embedding into vectors IMDb reviews pretrainmodel7 Deep ConvolutionNeural Network [24] Vector Features IMDb Movie Reviews LSTM [25] Vector Features IMDb Movie Reviews Neural Network [26] Lexicon Features IMDb reviews 86%Heterogeneous features were extracted from the movie review to achieve the bestperformance for Naive Bayes [23]. There are also some other Amazon datasets publiclyavailable with many non-textual features. Furthermore, many researchers have also workedon an Amazon dataset, analysing reviews using non-textual features, which include productfeatures, user features and ratings [27,28]. The above literature concludes that to improvethe performance of the model features, the size of the dataset plays a more important role;only the use of an efficient algorithm is not sufficient to improve the performance of this experimentation dataset of 5331 positive and 5331 negative processed snippetsor sentences, the sentences are labelled according to their polarity. The total number ofsentences used for training purposes is 9595 sentences or snippets and 1067 sentences areused to test the model. First, the pretrained Word2Vec is used for feature extraction andthen Convolutional Neural Network CNN is applied to these features extracted fromWord2Vec. The Google News dataset contains 3 million words on which Word2Vec istrained to achieve the embedding of words into vectors. Testing accuracy is achieved onthe test dataset and is [24].In this paper, three datasets are used; the first dataset consists of 50 thousand reviews25 thousand are positive, and 25 thousand are negative. The data are already separated inthe form of training and testing reviews in which the ratio of positive and negative numbersof reviews is the same. The first drawback of this experimentation is that the dataset is notselected for training and testing of randomized models, which bringsbias to this paper. Thesecond dataset used in the experiments is 200 movies, each having ten categories in DoubanMovies. The rating of movies was from 0 to 5. A movie rating of 1 to 2 was considered anegative review and a 3 to 5 movie rating is considered a positive review of the movie. Thecomments that had a rating of 3 were ignored. So, there were 6000 used as training andthe other 6000 were used to test the dataset. The total number of comments achieved afterremoving neutral reviews was 12,000. The second drawback is that in this paper, the ratiois 5050 and most of the references show that 8020 or 7030 is the best ratio for splittingthe dataset. For evaluating the classification performance, three classifiers are used forsentiment classification. One is NB, an extreme learning machine and LSTM is conductedbefore that dataset is passed through Word2Vec for word embedding. The word vectorswere sent to LSTM for classification and the results show that LSTM performed better thanother classifiers. The LSTM F-Measure was [25]. The last reference mentioned in Mathematics 2022,10, 3260 5 of 20Table 1, shows that the accuracy achieved by NN is 86% using lexicon features. This alsoapplies to neural networks. In the IMDb dataset of movie reviews used in this research,reviews are normalized using the following steps All the words of reviews are convertedto lower case from upper case words or characters. Secondly, numbers are removed, specialcharacters, punctuational marks and other diacritics are removed. White spaces includedin the review were also removed. Finally, abbreviations are expanded and stop words inreviews are also removed. All the processing of reviews involved in the referred paper isdescribed above [26].Word Embedding Using the Word2Vec ApproachWhile representing a word, the context in which the word is used matters a lot becauseit changes the semantics of words. For example, consider the word ’bank’. One meaning ofthe word bank is a financial place, and another is land alongside water. If the word ’bank’is used in a sentence with words such as treasury, government, interest rates, money, etc.,we can understand by its context words its actual meaning. In contrast, if the context wordsare water, river, etc., the actual meaning in this case of context word is land. One of theemerging and best techniques we know for word embedding is used in many fields suchNLP, biosciences, image processing, etc., to denote text using different models. The resultsusing word embedding are shown in Table 2. Word2Vec results in other fields ResultsImage Processing [29] 90% accuracyNatural Language Processing Tasks [30] More than 90% accuracyRecommendation Tasks [31] Up to 95% accuracyBiosciences [32] More than 90% accuracySemantics Task [33] More than 90% accuracyMalware Detection Tasks [34] Up to 99% accuracyWord embedding is most important and efficient nowadays in terms of representing atext in vectors without losing its semantics. Word2Vec can capture the context of a word,semantic and syntactic similarity, relation with other words, etc. Word2Vec was presentedby Tomas Mikolov in 2013 at Google [35]. Word2Vec shows words in a vector space. Thewords in the review are represented in the form of vectors and placement is carried out sothat dissimilar words are located far away and similar meaning words appear together invector Proposed MethodologyThe proposed methodology, the environment of hardware and software was set asneeded to perform experiments. The hp laptop core i5 4th generation having 8 GB RAMis used for experimentation. The Google Colab software is used and is the IntegratedDevelopment Environment for the Python language in which we peformed our the latest libraries of Python are used for experiments. The research methodologyconsisted of four steps. The steps are dataset acquisition, feature engineering, models andevaluation, shown in Figure 1below. Figure 1defines that after preprocessing of dataacquisition from the IMDb movie review website, it is passed for feature engineering, whichconsists of three blocks B, C and D. B, C and D blocks are used independently as well asin hybrid; B and C, and B and D blocks are named Hybrid-1 and Hybrid-2, E consists of 10-fold cross-validation, training and testing of different ML modelsand the last one is the evaluation process of models. After extraction of features, eachfeature is normalized using the Min/Max Normalization technique. On the normalizedfeature, 10-fold cross-validation is applied to remove the bias. Machine-learning ML anddeep-learning DL models are trained and tested; these are Support Vector Machine SVM,Naive Bayes NB, Random Forest RF, Logistic Regression LR, Multi-Layer Perceptron Mathematics 2022,10, 3260 6 of 20MLP, Convolution Neural Network CNN and Bidirectional Gated Recurrent Unit Bi-GRU. The results were achieved after implementing models on features and were Review Dataset AcquisitionLinguistic Inquiry and Word CountWord2Vec model training and word Embedding Pretrained glove modelMin/Max Normalization Hybrid A+B Hybrid A+B10 k fold stratified cross validationSVMNBRandom ForestLogistic RegressionCNNBiGRUAccuracyRecallPrecisionF1 ScoreComparison AB C DEFigure 1. General Diagram of working flow of Research Dataset AcquisitionThe benchmark of the movie review dataset from IMDb is collected and availablepublicly. The main dataset exists of 50,000 reviews with polarity levels. The ground ratingis also available according to the 10-star rating from different customers. A review with arating of less than 4 is a negative review, and a review with a score of more than seven is apositive review. All the reviews are equally pre-divided into 25,000 positive reviews andthe other 25,000 negative. Each review is available in the text document. Fifty-thousandtext documents containing reviews were Preprocessing for Feature ExtractionAfter downloading, each text document including reviews is preprocessed by usingPYCHARM IDE. In two columns, all the reviews and their polarity are read and written inthe Comma Separated Value CSV file. One column indicates the reviews and the secondcolumn indicates the polarity. Firstly, the reviews in sentences tokenized into words andthen all the special characters, stop words and extra spaces are removed from the reviewusing the NLP tool kit library available in Python. The preprocess reviews are written upin the preprocess column of the CSV file for future Data Preprocessing ToolFor data preprocessing, we use the tool PYCHARM 2018 IDE and Python version Natural Language Tool Kit NLTK is used for text processing such as tokenizationand stop word removal. Google Colab is used for implementing DL algorithms because itprovides GPU and TPU for fast processing. Mathematics 2022,10, 3260 7 of Feature Feature Extraction Using LIWCThe LIWC consists of multiple dictionaries to analyze and extract the features. Toextract psychological, textual and linguistic features from the movie review dataset, LIWCis used. First, the reviews are preprocessed and then used to extract features, as describedin Figure 2. The diagram flow is defined as the preprocessed reviews passing sent to LIWCfor extraction of the feature. LIWC compares each word of review from its dictionariesto check which category the given review word belongs to. It calculates the percentageby counting the number of words in the review that belong to a specific category anddivides by the total number of reviews. The division result is multiplied by 100 to obtain apercentage as described in Equation 1.x=Count the number o f words in review that bel ong to s peci f ic categoryTotal numb er o f words i n review s ×100 1xdenotes the specific subcategory of features in LIWC. The features calculated byLIWC are positive emotionPE, negative emotionNE, angerAng, sadness, clout,dictionary wordsDic, adverbsAdvand adjectivesAdj.PE,NE,Ang,SadandCloutare categorized byLIWCas psychological categorized aslinguistic ReviewsLowercase ,Remove stop words, Extra spaces, special characters, LemmatizationLIWCExtract FeaturesLinguistic/Summary LanguagePsychologicalReadabilityEPositive Emotion, Negative Emotion, Anger, Sad, clout, Adjective, Adverb, Dictionary Words ARI, CLI, Word CountBFigure 2. Feature Engineering with 3shows that after the extraction of features, Min/Max Normalization is appliedand then passes through block E for further implementation, including 10-fold cross-validation, training of ML models, testing of ML models and last is evaluation. Mathematics 2022,10, 3260 8 of 20InmovieheroplaygoodWt Wt-2 Wt-1 Wt+1 Wt+2 Wt+3 Sci-fiction000100Input Layer Hidden Layer Output Layer Example In Sci-fiction movie hero play good role ever and the Window size 7Figure 3. General Diagram of working flow of Research Readability Feature ExtractionThe readability score of reviews defines the effort required to understand the text ofreviews. The three readability features are calculated on the preprocessed reviews ARI,CLI and word is used for measuring the readability of English text and it is calculated by usingthe formula given in Equation 2.ARI = ×CW+ ×WS− 2where Crepresents characters that counts letters and numbers in review, Wrepresentswords and the number of spaces in review. Srepresents sentences that is the number ofsentences in Iscores define how difficult text is to understand and it is calculated by using theformula given in Equation 3.CL I = 3whereLrepresents the average number of letters per 100 words and S represents theaverage number of sentences per 100 words to measure the understandability of a count WC is calculated by linguistic inquiry word count which consists ofmultiple dictionaries and is calculated with Equation 4.WordCount =Nallwords −N punctuation−Nstopwords −Nnonalpha 4where Nallwords represents the total number of words in the review text, Npunctuationrepresents the number of punctuation characters in the review text, Nstopwords representsthe number of stop words in the review text and Nnonalpha represents the number ofnon-alphabetic terms in the review the extraction, each readability feature of Min/Max Normalization is applied, asdescribed in the next Word Embedding by Review-Based Training of Word2Vec ModelThe features of movie reviews are extracted by training the Word2Vec neural sequence of the feature-extraction process is given in Figure 4below. Firstly, for Mathematics 2022,10, 3260 9 of 20training, the neural model of Word2Vec data is prepared using the dataset of IMDb moviereviews with 50 thousand reviews. The total number of words included in this dataset is6 review was used in the training of the Word2Vec neural model and three differentembedding sizes were used in experiments, 50, 100 and 150, with a context size of 10. Thereare two methods for training the Word2Vec neural model; one is the COBOW context ofthe bag of words and the second one is the Skip-Gram Method. We used the Skip-GramMethod, which focuses on less frequent words and gives good results concerning wordembeddings of less frequent words. Skip-Gram Method operations are given in Figure 3defines that the model is trained by defining the window size 10 and Skip-Gram computes word embedding. Instead of using context words as input to predict thecenter word like a context bag of words, it used the center word as input and predictsthe center word’s context words. For example, “In Sci-fiction movie hero play good role”with context size 7. Training instances are created such as “In” is the target word which isthe input and the context word “Sci-fiction movie hero play the good role” is the outputword. The training instances are given in Table 3. Using training samples defined above inthe table used for training the neural network, the result of word embedding is generatedfor each word given in the vocabulary. The trained model is saved and movie reviewspass to these models for converting words into vectors. Three different types of vectorshaving sizes of 50, 100 and 150 are created. For classification, Word2Vector features areused measured by Skip-Gram Method passed to block ReviewsLowercase ,Remove stop words, Extra spaces, special characters, Lemmatization50 Thousand Reviews6,142,469 wordsVocabularyTrain Word2vec Neural ModelTrained Word2vec ModelTrained Word2vec ModelTrained Word2vec Model50 embedding sizeContext size 10 100 embedding sizeContext size 10150 embedding sizeContext size 10Skip gram Method Skip gram Method Skip gram MethodTest Model Test Model Test ModelVector50 Vector100 Vector150EConvert Sentences into wordsC1CFigure 4. Feature Extraction Process with self Pretrained Word2Vec 3. Word2Vec Results in other fields’ OutputIn sci-fictionIn movieIn heroIn playIn goodIn roleIn ever Mathematics 2022,10, 3260 10 of Word Embedding by Pretrained Word2Vec ModelThe Glove Model is an unsupervised learning algorithm used for vector representationof words. Training samples are taken from Wikipedia and different books. The GloveModel uses a generalized kind of text on which it is trained. Figure 5describes the stepsfor word embedding into first step is to download the Glove Model in the zip file with 150 vectors and300 vectors. The pretrained Glove Model is loaded and passed for test on the preprocessedreviews. Each preprocess review consists of words and is passed to the test model as inputand output are received as the vector of each review by taking the average of vectors. Eachreview vectors has 150 and 300 numbers in review vectors. The output of the vector ispassed to the E block for further implementation, which includes 10-fold cross-validation,training ML models, testing ML models and Million Vocabulary Wikipedia +books Preprocessed ReviewsConvert Sentences into wordsTrained glove Model 300Trained glove Model 150Test trained ModelTest trained ModelVectors 150 Vectors 300ETaking Mean of vectors Taking Mean of vectors DFigure 5. Feature Extraction Process with Pretrained Word2Vec Evaluation and DatasetThe dataset selected for the experiment is IMDb movie reviews, consisting of 50,000 re-views of different movies with sentiment polarity. The reason for this dataset selection isthat it is the largest number of reviews compared to the previously uploaded dataset ofmovie reviews on the website accessedon 4 April 2022. A total of 25,000 reviews are positive and the other 25,000 thousandreviews are negative. Each review is in the text file so in the zip file 50,000 text files areincluded with their rating value from 1 to 10 as text filename. Mathematics 2022,10, 3260 11 of Feature Exploration and Hypothesis TestingIn this subsection, the linguistic, psychological and readability features extracted fromthe reviews and used in the sentiment-based review classification are explored. A summaryof the descriptive statistics of the features under each category linguistic, psychologicaland readability are provided in Table 4. This summary includes the number of datarecords N, mean, median, standard deviation SD, maximum Max and minimum Minvalues of the features under each category. Moreover, the significance of the featuresrelated to the three categories is examined using hypothesis testing. In order to select theright significance test, the normality of the features is examined. To obtain a sense of thedistributions of features and outcome variable, histograms and associated distributioncurves are plotted as depicted in Figure 6. It is noteworthy that only CLI has a well behavedbell-shaped distribution curve while all other features are not. To confirm this observation,normal probability plots for all features are provided in Figure 7. A normal probability plotdemonstrates the deviation of record distribution from normality. It has been observed thatthe Adv, Adj and Clout distributions deviate slightly from normal distribution. However,all other feature distributions except CLI are not normally investigate the association between input features, a correlation matrix is the probability distributions of most features are not Gaussian, it is not possible to usePearson correlation to check the relationship between features. In contrast, the Spearmancorrelation coefficient is an efficient tool to quantify the linear relationship between con-tinuous variables that are not normally distributed [36]. As this is the case of our inputfeatures, Spearman correlation has been adopted in this study to quantify the associationbetween the features. A heat map of the Spearman correlation coefficient is created andpresented in Figure 8. The circle’s size is indicative of the strength of bivariate correlation map of Figure 8reveals a strong relationship between anger and negativeemotions and between ARI and CLI features, and a moderate association between NE andsadness and between Dic and ARI and CLI. However, the map shows weaker associationbetween the other input features. As the outcome, polarity class, is a categorical variable,the correlation coefficient is not an adequate tool to measure its association with the inputfeatures. Therefore, Binomial Logistic Regression LR has been adopted to investigate thisassociation. Logistic Regression assesses the likelihood of an input feature being linked toa discrete target variable [37]. The input features do not exhibit high multicollinearity, asdeducted from the correlation matrix plot of Figure 8, which makes the LR a suitable test ofassociation for our problem. Table 5displays the output of a Binomial Logistic Regressionmodel that was fitted to predict the outcome based of the linguistic, psychological andreadability feature values. The p-values and significance levels for each of the regressionmodel’s coefficients are listed in Table 5. The asterisks denote the level of the feature’ssignificance; more asterisks imply a higher level of importance. If the associated p-valueis less than three asterisks are used to denote significance, two asterisks are used torepresent significance if the corresponding p-value is in the range [ one asteriskreflects a p-value between and and no asterisk is for p-values larger than Asshown in Table 5, the p-values for PE, NE, Ang, Sad, Clout, Adj and CLI indicate that thesefeatures are statistically significant to the polarity class. Mathematics 2022,10, 3260 12 of 20Figure and probability distribution curves for linguistic, physiological, readabilityfeatures and polarity class probability plots for linguistic, physiological, readability features and polarity classvariables. Mathematics 2022,10, 3260 13 of 20Table statistics summary of linguistic, psychological, readability features and NE Ang Sad Clout Dic Adv Adj WC ARI CLI PolarityMean 175 0 0 0 0 0 0 12 − − 0Max 99 1304 1N2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000Table significance of linguistic, psychological and readability features using BinomialLogistic S-Error t-Statistics p-Value QualityPE − − **NE ***Ang ***Sad *Clout ***Dic **WC − − ***Figure Correlation coefficient matrix of linguistic, psychological and readability Chi-square hypothesis test is conducted to verify the sufficiency of the LR modelto test a feature’s significance. The null hypothesis of the test, H0, assumes that there isno relationship between the response variable, the polarity and any of the input features, all model coefficients except the intercept are zero. On the other hand, the alternativehypothesis, H1, implies that if any of the predictor’s coefficients is not zero, then thelearning model is called efficient. The p-value of the Chi-square test of the model wasrecorded as 1988 degrees of freedom on 2000 observations for all indicates that the LR model differs statistically from a constant model with only theintercept term and can be considered as an adequate test of feature significance. As aresult, the null hypothesis can be rejected, and the association between the input featuresin predicting the polarity of a review is confirmed. As depicted in Table 4, the binomial LRreveals that all psychological features are significant. However, only Adj from the linguistic Mathematics 2022,10, 3260 14 of 20features and CLI from the readability features are significant. Therefore, only significantfeatures are used for review classification in this Evaluation Measure and Performance ComparisonThe evaluation of the deep-learning and conventional models is carried out by calcu-lating the performance measures accuracy, precision, recall and F-Measure. These perfor-mance measures are calculated on the basis of a confusion matrix. The details of confusionmatrixes are given Confusion MatrixA confusion matrix is also known as an error matrix and is used for measuring theperformance of a classification model. A confusion matrix is represented in Figure a review is an actual negative, and the model is predicted as positive it is calledfalse positive FP. When a review is an actual positive, and the model is predicted to bepositive, it is called true positive TP. When a review is an actual positive, and the model ispredicted as negative, it is called false negative FN. When a review is an actual negative,and the model is predicted as negative, it is called true negative TN.TP FPFN TNPositive1 Negative0Positive1Negative0Actual ValuesPredicted ValuesFigure 9. Confusion Pretrained Word EmbeddingThe pretrained word embedding Glove experimented with two different words em-bedding word vector dimensions 150 and 300. The 6 ML classifiers are used with 150 wordvector dimensions and each word vector is tested. The experiments with 150 and 300 wordvector dimension and their results are shown in Tables 6and 6. Results of pretrained model of vector dimension of 150 Accuracy Precision Recall F ScoreMulti-LayerPerceptron NearestNeighbor” Forest Bayes VectorMachine Mathematics 2022,10, 3260 15 of 20Table 7. Pretrained model of vector dimension 300 Training Accuracy Average Testing Accuracy AverageCNN the movie review dataset preprocessing, it is passed to the 10-fold stratified cross-validation for the unbiased splitting of the dataset. The Glove pretrained model for featureengineering process is used. The 150 dimensions of the Glove pretrained model are used asa feature for ML models. The six ML algorithms are applied and SVM achieves the bestresults concerning other algorithms NB, RF, LR, KNN and MLP on the evaluation measuresof accuracy, precision, recall and F-Measure. The highest F-Measure score achieved is SVM, which is the impact of the pretrained Glove Model with 150 dimensions offeature vectors. The ML algorithm performs better on the 150 dimension vector of MLP, three layers are used with 20 neurons at each layer to predict review impact of the pretrained Glove Model having 300 dimensions is represented inTable 7. The two DL models are applied to features having a vector dimension of used models are CNN and Bi-GRU and the best results are achieved with Bi-GRUwith testing accuracy. The lowest dimension of the pretrained model is 150, whichleaves a higher impact on the results using the traditional ML algorithm compared to the300 dimensions using the DL Review-Based Trained Word2Vec Model Word EmbeddingThe reviews are embedded into vectors with three different word vector size dimen-sions, 50, 100 and 150. Then, the ML and DL algorithms are applied to varying sizes ofvectors of 28 dimensions independently and evaluated. Finally, the results are shown inTable 8based on the 8. Trained Model on reviews with 50 word vector dimension evaluation Accuracy Precision Recall F ScoreNaive Bayes Forest VectorMachine 50 dimensions of the Word2Vec model are self-trained on movie reviews. Afterthat self-trained model, it is used for word embedding of the movie reviews into vectorsrepresenting the meaning of that word. Then, the six ML algorithms are applied. The SVMachieves the best results compared to other algorithms, NB, RF, LR, KNN and MLP, on theevaluation measures accuracy, precision, recall and F-Measure. The highest F Measure scoreachieved is using SVM with 50 word embedding dimension, which is the impact ofthe self-trained model with a smaller number of dimensions. In Table 9, the 100 dimensionparameter of the self-trained model is evaluated using a confusion matrix. Mathematics 2022,10, 3260 16 of 20Table 9. Without the pretrained model with a 100 word vector dimension evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine 100 dimensions of the Word2Vec model are self-trained on movie reviews. Afterthat model is self-trained, it is used for word embedding of the movie reviews into vectorsrepresenting the meaning of that word. Then, the six ML algorithms are applied. The SVMachieves the best results compared to other algorithms, NB, RF, LR, KNN and MLP on theevaluation measures accuracy, precision, recall and F-Measure. The highest F-Measurescore achieved is using SVM with 100 word embedding dimensions, which is theimpact of the self-trained model with a higher number of dimensions than previous Table 10, the impact of 150 dimensions of the self-trained model is trained on reviews of 150 word vector dimension without psychological, linguisticand readability features evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine 150 dimensions of the Word2Vec model are self-trained on movie reviews. First,the context size of the model is set to 10 and the Skip-Gram Method is used to train theWord2Vec model. After that model is self-trained, it is used for word embedding of themovie reviews into vectors representing the meaning of that word. Then, the six MLalgorithms are applied. The SVM achieves the best results compared to other algorithms,NB, RF, LR, KNN and MLP on the evaluation measures accuracy, precision, recall andF-Measure. The highest F-Measure score achieved is using SVM with 150 wordembedding dimensions, which is the impact of the self-trained model with a higher numberof dimensions than the previous 50 and 100 dimension results. In Table 11, the impactof 150 dimensions of the self-trained model in addition to psychological, linguistic andreadability features is defined. The 150 dimension self-trained model with proposedfeatures is considered because it shows better results than the pretrained Glove psychological features are extracted using LIWC. Next, the psychological featuresused in this experiment are positive emotion, negative emotion, anger, sadness, clout anddictionary words. CLI’s readability feature is used because it gave a better result in theprevious experiment. Mathematics 2022,10, 3260 17 of 20Table trained on reviews of 150 word vector dimension with psychological, linguistic andreadability features evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine the six ML algorithms are applied. The SVM achieves the best results withrespect to other algorithms, NB, RF, LR, KNN and MLP, on the accuracy, precision, recall andF-Measure evaluation measures. The highest F-Measure score achieved is using psychological, linguistic and readability features improve the evaluation 12 shows the impact of 300 dimensions of the self-trained model concerning results on word embedding 150 word vectors with psychological and Training Average Accuracy Testing AccuracyCNN 2 Layers evaluation result of two DL algorithms applied to 300 dimension vectors withoutpsychological and readability features. The impact on accuracy of 300 dimensions of theself-trained model is higher than the 300 dimensions of the pretrained model. The resultsshow that the method of embedding that is context-based gives higher results with respectto global based embedding. The applied models are CNN with two layers with 32 and64 neurons, respectively. Bi-GRU is used, which has two gates; one is an updated gate andthe other is a reset gate. The update gate is used to the retain memory and the reset gateis used to forget memory. The best results are achieved with Bi-GRU with testingaccuracy as compared to the pretrained Glove evaluation results of two DL algorithms applied on 300 word vectors with psy-chological and readability features are given in Table results on word embedding 300 word vectors with psychological, linguistic andreadability Training Accuracy Average Testing Accuracy AverageCNN the psychological features are extracted using LIWC. The psychological featuresused in this experiment are positive emotion, negative emotion, anger, sadness, clout anddictionary words. CLI’s readability feature gave a better result in the previous applied models are CNN with two layers with 32 and 64 neurons, Bi-GRU has two gates; one is an updated gate and the other is a reset gate. The updatedgate is used to retain the memory and the reset gate is used to forget the memory. Bi-GRUachieves the best results with testing accuracy compared to the pretrained GloveModel. In Table 14, a comparison is given between the proposed work and the previouswork based on evaluation measures. Mathematics 2022,10, 3260 18 of 20Table 14. Comparison of F-Measure of Proposed work with Previous Embedding Model Classifier F-MeasureReview based trainedWord2Vec Support Vector Machine Word2Vec [16] CNN-BLSTM Word2Vec [22] LSTM [18] Maximum Entropy analysis of the results following the experiment is given below.•The self-trained Word2Vec model on movie reviews with 150 dimension parameterhas a higher impact on performance than the pretrained Glove Model.• The CLI readability achieved the highest score compared to ARI and WC.•The SVM algorithm performs better than the applied algorithms NB, LR, RF, CNN,KNN and MLP.•The use of the psychological and readability feature CLI to classify reviews withself-trained embedding improves the performance from 86% to smaller number of words embedding dimension 150 performs better concerningthe traditional ML algorithm and for the DL algorithm 300 dimensions gives a ConclusionsClassification of opinion mining of reviews is open research due to the continuousincrease in available data. Many approaches have been proposed to achieve classificationof movie reviews. After a critical analysis of the literature, we observe that words areconverted into vectors for sentiment classification of movie reviews by different approaches,including TF-IDF and Word2Vec. The pretrain model of Word2Vec is commonly used forword embedding into vectors. Mostly generalized data are used to train the Word2Vecmodel for extracting features from reviews. We extract features by training the Word2Vecmodel on specific data related to 50 thousand reviews. For review classification, theWord2Vec model is trained on reviews. Most researchers used a generalized trained modelfor review classification as an alternative. This research work extracts features from moviereviews using a review-based trained Word2Vec model and LIWC. The review-basedtrained data have some characteristics. They include 6 million vocabularies of the wordand are specific to movie reviews related to the task of sentiment classification of six ML algorithms are applied, and SVM achieves the best result of F-Measurewith respect to other algorithms NB, RF, LR, KNN and DL algorithms are also applied. One is CNN and the other is Bi-GRU. Bi-GRUachieved which is greater than the results CNN achieved. The results conclude thatthe data used for model training perform better than the model trained on generalized the ML algorithm,150 features perform better than 50 and 100 features for theused movie review dataset. The DL model 300 feature vectors perform better classificationsthan the 150 feature vectors. Significant psychological, linguistic and readability featuresaided in improving the classification performance of the used classifiers. SVM achievedan F-Measure with 150 word vector size and BiGRU achieved the same F-Measurescore using 300 word vector size. We applied both traditional ML and DL algorithmsfor the classification of reviews. Both achieved nearly the same results on a performancemeasure that proves that the dataset of IMDb movie reviews having 50,000 is not enoughfor applying a DL algorithm. In future work, a larger dataset is needed to apply the DLalgorithm to increase the classification performance of ContributionsConceptualization, Muhammad Shehrayar Khan, Saleem Khan, Muhammad Shehrayar Khan, and methodology, Muhammad Shehrayar Khan, Muhammad Saleem Khan, and Mathematics 2022,10, 3260 19 of software Muhammad Shehrayar Khan, Muhammad Saleem Khan, and validation, Muhammad Shehrayar Khan, MuhammadSaleem Khan, and formal analysis, Muhammad Shehrayar Khan, and investigation, Muhammad Shehrayar Khan, MuhammadSaleem Khan, and resources, Muhammad Shehrayar Khan; data curation, Saleem Khan; writing original draft preparation, Muhammad Shehrayar Khan, Muhammad Saleem Khan, and writing review and editing, Shehrayar Khan, Muhammad Saleem Khan, and Muhammad Shehrayar Khan, Muhammad Saleem Khan, and supervision, project administration, funding acquisition, and authors have read and agreed to the published version of the This research received no external Availability StatementI declare that the data considered for this research is original andcollected by the authors for generating insights. Moreover, the data mining & ML tools consideredfor this research are freely available and built the models in accordance with our own of InterestThe authors declare that there is no conflict of interest related to this Wang, S.; Fang, H.; Khabsa, M.; Mao, H.; Ma, H. Entailment as Few-Shot Learner. arXiv 2021, arXiv Nguyen, Dao, Sentiment Analysis of Movie Reviews Using Machine Learning Techniques. InProceedings of Sixth International Congress on Information and Communication Technology, London, UK, 25–26 February 2021;Springer Berlin, Germany, 2022; pp. 361– U.; Khan, S.; Rizwan, A.; Atteia, G.; Jamjoom, Samee, Aggression Detection in Social Media from Textual DataUsing Deep Learning Models. Appl. Sci. 2022,12, 5083. [CrossRef] T.; Faisal, Rizwan, A.; Alkanhel, R.; Khan, Muthanna, A. Efficient Fake News Detection Mechanism UsingEnhanced Deep Learning Model. Appl. Sci. 2022,12, 1743. [CrossRef] Rizwan, A.; Iqbal, K.; Fasihuddin, H.; Banjar, A.; Daud, A. Prediction of Movie Quality via Adaptive Voting Access 2022,10, 81581–81596. [CrossRef] A.; Abbas, Y.; Ahmad, T.; Mahmoud, Rizwan, A.; Samee, A Healthcare Paradigm for Deriving KnowledgeUsing Online Consumers’ Feedback. Healthcare 2022,10, 1592. [CrossRef] A.; Agrawal, A.; Rath, Classification of sentiment reviews using n-gram machine learning approach. Expert 2016,57, 117–126. [CrossRef] Mohamed, Haggag, A survey on opinion summarization techniques for social media. Future J. 2018,3, 82–109. [CrossRef] I.; Varma, Govardhan, A. Preprocessing the informal text for efficient sentiment analysis. Int. J. Emerg. TrendsTechnol. Comput. Sci. IJETTCS 2012,1, 58– Shenoy, Mohan, Aspect term extraction for sentiment analysis in large movie reviews using Gini Indexfeature selection method and SVM classifier. World Wide Web 2017,20, 135–154. [CrossRef] B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. arXiv2002,arXivcs/ Bakar, Yaakub, A review of feature selection techniques in sentiment analysis. Intell. Data 159–189. [CrossRef] M.; Harish, B. A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize TextDocuments. Int. J. Interact. Multimed. Artif. Intell. 2018,5, 106. [CrossRef] M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-based methods for sentiment analysis. Comput. 267–307. [CrossRef] A.; Zhang, D.; Levene, M. Combining lexicon and learning based approaches for concept-level sentiment of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, Beijing, China, 12 August2012; pp. 1– L.; Wang, H.; Gao, S. Sentiment feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. 2018,9, 75–84. [CrossRef] S.; Kar, Baabdullah, A.; Al-Khowaiter, Big data with cognitive computing A review for the future. Int. J. 2018,42, 78–89. [CrossRef]18. Fink, L.; Rosenfeld, L.; Ravid, G. Longer online reviews are not necessarily better. Int. J. Inf. Manag. 2018,39, 30–37. [CrossRef] L.; Goh, Jin, D. How textual quality of online reviews affect classification performance A case of deep learning sentimentanalysis. Neural Comput. Appl. 2020,32, 4387–4415. [CrossRef] Mathematics 2022,10, 3260 20 of Z. Sentiment Analysis of Movie Reviews based on Machine Learning. In Proceedings of the 2020 2nd InternationalWorkshop on Artificial Intelligence and Education,Montreal, QC, Canada, 6–8 November 2020; pp. 1– Karim, M.; Das, S. Sentiment analysis on textual reviews. IOP Conf. Ser. Mater. Sci. Eng. 2018,396, 012020. [CrossRef] H.; Harish, B.; Darshan, H. Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method. Int. Multimed. Artif. Intell. 2019,5, 109–114. [CrossRef] R. Sentiment analysis of movie reviews using heterogeneous features. In Proceedings of the 2018 2nd InternationalConference on Electronics, Materials Engineering & Nano-Technology IEMENTech, Kolkata, India, 4–5 May 2018; pp. 1– Chaurasia, S.; Srivastava, Sentiment short sentences classification by using CNN deep learning model withfine tuned Word2Vec. Procedia Comput. Sci. 2020,167, 1139–1147. [CrossRef] Liu, Luo, X.; Wang, L. An LSTM approach to short text sentiment classification with word embeddings. InProceedings of the 30th conference on computational linguistics and speech processing ROCLING 2018, Hsinchu, Taiwan, 4–5October 2018; pp. 214– Z.; Zulfiqar, Xiao, C.; Azeem, M.; Mahmood, T. Sentiment analysis on IMDB using lexicon and neural Appl. Sci. 2020,2, 1–10. [CrossRef] A.; Mukhopadhyay, S.; Panigrahi, Goswami, S. Utilization of oversampling for multiclass sentiment analysis onAmazon review dataset. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and TechnologyiCAST, Morioka, Japan, 23–25 October 2019; pp. 1– A.; Akhilesh, V.; Aich, A.; Hegde, C. Sentiment analysis of restaurant reviews using machine learning techniques. InEmerging Research in Electronics, Computer Science and Technology; Springer Berlin, Germany, 2019; pp. 687– Ghosh, Valveny, E.; Harit, G. Beyond visual semantics Exploring the role of scene text in image Recognit. Lett. 2021,149, 164–171. [CrossRef] L.; Wang, G.; Zuo, Y. Research on patent text classification based on Word2Vec and LSTM. In Proceedings of the 2018 11thInternational Symposium on Computational Intelligence and Design ISCID, Hangzhou, China, 8–9 December 2018; Volume 1,pp. 71– Q.; Dong, H.; Wang, Y.; Cai, Z.; Zhang, L. Recommendation of crowdsourcing tasks based on Word2Vec semantic tags. Mob. Comput. 2019,2019, 2121850. [CrossRef] Peña, Breis, San Román, I.; Barriuso, Baraza, Snomed2Vec Representation of SNOMED CTterms with Word2Vec. In Proceedings of the 2019 IEEE 32nd International Symposium on Computer-Based Medical SystemsCBMS, Cordoba, Spain, 5–7 June 2019; pp. 678– A.; Khatua, A.; Cambria, E. A tale of two epidemics Contextual Word2Vec for classifying twitter streams duringoutbreaks. Inf. Process. Manag. 2019,56, 247–257. [CrossRef] T.; Mao, Q.; Lv, M.; Cheng, H.; Li, Y. Droidvecdeep Android malware detection based on Word2Vec and deep beliefnetwork. KSII Trans. Internet Inf. Syst. TIIS 2019,13, 2180– T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv2013,arXiv C.; Dehon, C. Influence functions of the Spearman and Kendall correlation measures. Stat. Methods 497–515.[CrossRef]37. Collett, D. Modelling Binary Data; CRC Press Boca Raton, FL, USA, 2002. ResearchGate has not been able to resolve any citations for this healthcare agencies HHCAs provide clinical care and rehabilitation services to patients in their own homes. The organization’s rules regulate several connected practitioners, doctors, and licensed skilled nurses. Frequently, it monitors a physician or licensed nurse for the facilities and keeps track of the health histories of all clients. HHCAs’ quality of care is evaluated using Medicare’s star ratings for in-home healthcare agencies. The advent of technology has extensively evolved our living style. Online businesses’ ratings and reviews are the best representatives of organizations’ trust, services, quality, and ethics. Using data mining techniques to analyze HHCAs’ data can help to develop an effective framework for evaluating the finest home healthcare facilities. As a result, we developed an automated predictive framework for obtaining knowledge from patients’ feedback using a combination of statistical and machine learning techniques. HHCAs’ data contain twelve performance characteristics that we are the first to analyze and depict. After adequate pattern recognition, we applied binary and multi-class approaches on similar data with variations in the target class. Four prominent machine learning models were considered SVM, Decision Tree, Random Forest, and Deep Neural Networks. In the binary class, the Deep Neural Network model presented promising performance with an accuracy of However, in the case of multiple class, the random forest model showed a significant outcome with an accuracy of Additionally, variable significance is derived from investigating each attribute’s importance in predictive model building. The implications of this study can support various stakeholders, including public agencies, quality measurement, healthcare inspectors, and HHCAs, to boost their performance. Thus, the proposed framework is not only useful for putting valuable insights into action, but it can also help with retrieval from huge social web data is a challenging task for conventional search engines. Recently, information filtering recommender systems may help to find movies, however, their services are limited because of not considering movie quality aspects in detail. Prediction of movies can be improved by using the characteristics of social web content about a movie such as social-quality, tag quality, and a temporal aspect. In this paper, we have proposed to utilize several features of social quality, user reputation and temporal features to predict popular or highly rated movies. Moreover, enhanced optimization-based voting classifier is proposed to improve the performance on proposed features. Voting classifier uses the knowledge of all the candidate classifiers but ignores the performance of the model on different classes. In the proposed model, weight is assigned to each model based on its performance for each class. For the optimal selection of weights for the candidate classifiers, Genetic Algorithm is used and the proposed model is called Genetic Algorithm Voting GA-V classifier. After labeling the suggested features by using a fixed threshold, several classifiers like Bayesian logistic regression, Naïve Bayes, BayesNet, Random Forest, SVM, Decision Tree, LSTM and AdaboostM1 are trained on MovieLens dataset to find high-quality/popular movies in different categories. All the traditional ML models are compared with GA-V in terms of precision, recall and F1 score. The results show the significance of the proposed features and proposed GA-V KhanSalabat KhanAtif Rizwan Nagwan AbdelsameeIt is an undeniable fact that people excessively rely on social media for effective communication. However, there is no appropriate barrier as to who becomes a part of the communication. Therefore, unknown people ruin the fundamental purpose of effective communication with irrelevant—and sometimes aggressive—messages. As its popularity increases, its impact on society also increases, from primarily being positive to negative. Cyber aggression is a negative impact; it is defined as the willful use of information technology to harm, threaten, slander, defame, or harass another person. With increasing volumes of cyber-aggressive messages, tweets, and retweets, there is a rising demand for automated filters to identify and remove these unwanted messages. However, most existing methods only consider NLP-based feature extractors, TF-IDF, Word2Vec, with a lack of consideration for emotional features, which makes these less effective for cyber aggression detection. In this work, we extracted eight novel emotional features and used a newly designed deep neural network with only three numbers of layers to identify aggressive statements. The proposed DNN model was tested on the Cyber-Troll dataset. The combination of word embedding and eight different emotional features were fed into the DNN for significant improvement in recognition while keeping the DNN design simple and computationally less demanding. When compared with the state-of-the-art models, our proposed model achieves an F1 score of 97%, surpassing the competitors by a significant spreading of accidental or malicious misinformation on social media, specifically in critical situations, such as real-world emergencies, can have negative consequences for society. This facilitates the spread of rumors on social media. On social media, users share and exchange the latest information with many readers, including a large volume of new information every second. However, updated news sharing on social media is not always this study, we focus on the challenges of numerous breaking-news rumors propagating on social media networks rather than long-lasting rumors. We propose new social-based and content-based features to detect rumors on social media networks. Furthermore, our findings show that our proposed features are more helpful in classifying rumors compared with state-of-the-art baseline features. Moreover, we apply bidirectional LSTM-RNN on text for rumor prediction. This model is simple but effective for rumor detection. The majority of early rumor detection research focuses on long-running rumors and assumes that rumors are always false. In contrast, our experiments on rumor detection are conducted on real-world scenario data set. The results of the experiments demonstrate that our proposed features and different machine learning models perform best when compared to the state-of-the-art baseline features and classifier in terms of precision, recall, and F1 growth of social networking web users, people daily shared their ideas and opinions in the form of texts, images, videos, and speech. Text categorization is still a crucial issue because these huge texts received from the heterogeneous sources and different mindset peoples. The shared opinion is to be incomplete, inconsistent, noisy and also in different languages form. Currently, NLP and deep neural network methods are widely used to solve such issues. In this way, Word2Vec word embedding and Convolutional Neural Network CNN method have to be implemented for effective text classification. In this paper, the proposed model perfectly cleaned the data and generates word vectors from pre-trained Word2Vec model and use CNN layer to extract better features for short sentences find out what other people think has been an essential part of information-gathering behaviors. And in the case of movies, the movie reviews can provide an intricate insight into the movie and can help decide whether it is worth spending time on. However, with the growing amount of data in reviews, it is quite prudent to automate the process, saving on time. Sentiment analysis is an important field of study in machine learning that focuses on extracting information of subject from the textual reviews. The area of analysis of sentiments is related closely to natural language processing and text mining. It can successfully be used to determine the attitude of the reviewer in regard to various topics or the overall polarity of the review. In the case of movie reviews, along with giving a rating in numeric to a movie, they can enlighten us on the favorableness or the opposite of a movie quantitatively; a collection of those then gives us a comprehensive qualitative insight on different facets of the movie. Opinion mining from movie reviews can be challenging due to the fact that human language is rather complex, leading to situations where a positive word has a negative connotation and vice versa. In this study, the task of opinion mining from movie reviews has been achieved with the use of neural networks trained on the “Movie Review Database” issued by Stanford, in conjunction with two big lists of positive and negative words. The trained network managed to achieve a final accuracy of 91%.Duc Duy Tran Sang NguyenTran Hoang Chau DaoSentiment analysis is the interpretation and classification of emotions and opinions from the text. The scale of emotions and opinions can vary from positive to negative and maybe neutral. Customer sentiment analysis helps businesses to point out the public’s thoughts and feelings about their products, brands, or services in online conversations and feedback. Natural language processing and text classification are crucial for sentiment analysis. That means we can predict or classify customers’ opinions given their comments. In this paper, we do sentiment analysis in the two different movie review datasets using various machine learning techniques including decision tree, naïve Bayes, support vector machine, blending, voting, and recurrent neural networks RNN. We propose a few frameworks of sentiment classification using these techniques on the given datasets. Several experiments are conducted to evaluate them and compared with an outstanding natural language processing tool Stanford CoreNLP at present. The experimental results have shown our proposals can achieve higher performance, especially, the voting and RNN-based classification models can result in better with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We not only extract and encode visual and scene text cues but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images with scene text content to demonstrate its effectiveness. In the retrieval framework, we augment the contextual semantic representation with scene text cues to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous scene text recognition, we also apply query-based attention to the text channel. We show that our multi-channel approach, involving contextual semantics and scene text, improves upon the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by in the relevant statement retrieval task and by 5% in the topic classification task. Due to the development of e-commerce and web technology, most of online Merchant sites are able to write comments about purchasing products for customer. Customer reviews expressed opinion about products or services which are collectively referred to as customer feedback data. Opinion extraction about products from customer reviews is becoming an interesting area of research and it is motivated to develop an automatic opinion mining application for users. Therefore, efficient method and techniques are needed to extract opinions from reviews. In this paper, we proposed a novel idea to find opinion words or phrases for each feature from customer reviews in an efficient way. Our focus in this paper is to get the patterns of opinion words/phrases about the feature of product from the review text through adjective, adverb, verb, and noun. The extracted features and opinions are useful for generating a meaningful summary that can provide significant informative resource to help the user as well as merchants to track the most suitable choice of IntroductionMuch of the existing research on textual information processing has been focused on mining and retrieval of factual information. Little works had been done on the process of mining opinions until only recently. Automatic extraction of customers’ opinions can better benefit both customers and manufacturers. Product review mining can provide effective information that are classifying customer reviews as “recommended” or “not recommended” based on customers’ opinions for each product feature. In this cases, customer reviews highlight opinion about product features from various Merchant sites. However, many reviews are so long and only a few sentences contain opinions for product a popular product, the number of reviews can be in hundreds or even in thousands, which is difficult to be read one by one. Therefore, automatic extraction and summarization of opinion are required for each feature. Actually, when a user expresses opinion for a product, he/she states about the product as a whole or about its features one by one. Feature identification in product is the first step of opinion mining application and opinion words extraction is the second step which is critical to generate a useful summary by classifying polarity of opinion for each feature. Therefore, we have to extract opinion for each feature of a this paper, we take a written review as input and produce a summary review as output. Given a set of customer reviews of a particular product, we need to perform the following tasks1identifying product feature that customer commented on;2extracting opinion words or phrases through adjective, adverb, verb, and noun and determining the orientation;3generating the use a part-of-speech tagger to identify phrases in the input text that contains adjective or adverb or verb or nouns as opinion phrases. A phrase has a positive semantic orientation when it has good associations “awesome camera” and a negative semantic orientation when it has bad associations “low battery”.The rest of the paper is organized as follows. Section 2 describes the related work of this paper. Section 3 elaborates theoretical background for opinion mining. Section 4 expresses methodology and experiments of the system and Section 5 describes are several techniques to perform opinion mining tasks. In this section, we discuss others’ related works for feature extraction and opinion words extraction. Hu and Liu [1] proposed several methods to analyze customer reviews of format 3. They perform the same tasks of identifying product features on which customers have expressed their opinions and determining whether the opinions are positive or negative. However, their techniques, which are primarily based on unsupervised item sets mining or association rule mining, are only suitable for reviews of formats 3 and 1 to extract product features. Then, frequent item sets of nouns in reviews are likely to be product features while the infrequent ones are less likely to be product features. This work also introduced the idea of using opinion words to find additional often infrequent of these formats usually consist of full sentences. The techniques are not suitable for pros and cons of format 2, which are very brief. Liu et al. [2] presented how to extract product features from “Pros” and “Cons” as type of review format 2. They proposed a supervised pattern mining method to find language patterns to identify product features. They do not need to determine opinion orientations because of using review format 2 indicated by “Pros” and “Cons.”Hu and Liu [3] proposed a number of techniques based on data mining and natural language processing methods to mine opinion/product features. It is mainly related to text summarization and terminology identification. Their system does not mine product features and their work does not need a training corpus to build a summary. Su et al. [4] proposed a novel mutual reinforcement approach to deal with the feature-level opinion mining problem. Their approach predicted opinions relating to different product features without the explicit appearance of product feature words in reviews. They aim to mine the hidden sentiment link between product features and opinion words and then build the association approach for mining product feature and opinion based on consideration of syntactic information and semantic information in [5]. The methods acquire relations based on fixed position of words. However, the approaches are not effective for many cases. Turney [6] presented a simple unsupervised learning algorithm for classifying reviews as recommended thumbs up or not recommended thumbs down. The classification of a review is predicted by the average semantic orientation of the phrases in the review that contains adjectives or adverbs. Wu et al. [7] implemented extracting relations between product feature and expressions of opinions. The relation extraction is an important subtask of opinion mining for the relations between more than one product features and different opinion words on each of and Lam [8, 9] employ hidden Markov models and conditional random fields, respectively, as the underlying learning method for extracting product features. Pang et al. [10], Mras and Carroll [11], and Gamon [12] use the data of movie review, customer feedback review, and product review. They use the several statistical feature selection methods and directly apply the machine learning techniques. These experiments show that machine learning techniques only are not well performing on sentiment classification. They show that the presence or absence of word seems to be more indicative of the content rather than the frequency for a word. Zhang and Liu [13] aimed to identify such opinionated noun features. Their involved sentences are also objective sentences but imply positive or negative opinions. They proposed a method to deal with the problem for finding product features which are nouns or noun phrases that are not subjective but Mining Opinion for Feature LevelIn this paper, we only focus on mining opinions for feature level. This task is not only technically challenging because of the need for natural language processing, but also very useful in practice. For example, businesses always want to find public or consumer opinions about their products and services from the commercial web sites. Potential customers also want to know the opinions of existing users before they use a service or purchase a product. Moreover, opinion mining can also provide valuable information for placing advertisements in commercial web pages. If in a page people express positive opinions or sentiments on a product, it may be a good idea to place an ad of the product. However, if people express negative opinions about the product, it is probably not wise to place an ad of the product. A better idea may be to place an ad of a competitor’s are three main review formats on the Web. Different review formats may need different techniques to perform the opinion extraction 1—pros and cons The reviewer is asked to describe pros and cons 2—pros, cons, and detailed review the reviewer is asked to describe pros and cons separately and also write a detailed 3—free format the reviewer can write freely, that is, no separation of pros and the review formats 1 and 2, opinion or semantic orientations positive or negative of the features are known because pros and cons are separated. Only product features need to be identified. We concentrate on review format 3 and we need to identify and extract both product features and opinions. This task goes to the sentence level to discover details, that is, what aspects of an object that people liked or disliked. The object could be a product, a service, a topic, an individual, an organization, and so forth. For instance, in a product review sentence, it identifies product features that have been commented on by the reviewer and determines whether the comments are positive or negative. For example, in the sentence, “The battery life of this camera is too short,” the comment is on “battery life” of the camera object and the opinion is real-life applications require this level of detailed analysis because, in order to make product improvements, one needs to know what components and/or features of the product are liked and disliked by consumers. Such information is not discovered by sentiment and subjectivity classification [14]. To obtain such detailed aspects, we need to go to the sentence level. Two tasks are apparent.1Identifying and extracting features of the product that the reviewers have expressed their opinions on, called product features for instance, in the sentence “the picture quality of this camera is amazing,” the product feature is “picture quality.”2Determining whether the opinions on the features are positive, negative or neutral. In the above sentence, the opinion on the feature “picture quality” is the sentence, “the battery life of this camera is too short,” the comment is on the “battery life” and the opinion is negative. A structured summary will also be produced from the mining Methodology to Find Patterns for Features and Opinions ExtractionThe goal of OM is to extract customer feedback data such as opinions on products and present information in the most effective way that serves the chosen objectives. Customers express their opinion in review sentences with single words or phrases. We need to extract these opinion words or phrases in efficient way. Pattern extraction approach is useful for commercial web pages in which customers can be able to write comments about products or services. Let us use an example of the following review sentence “The battery life is long.”In this sentence, the feature is “battery life” and opinion word is “long.” Therefore, we first need to identify the feature and opinion from the 1 shows the overall process for generating the results of feature-based opinion summarization. The system input is customer reviews’ datasets. We first need to perform POS tagging to parse the sentence and then identify product features and opinion words. The extracted opinion words/phrases are used to determine the opinion orientation which is positive or negative. Finally, we summarize the opinion for each product feature based on their this paper, we focus on feature extraction and opinion word extraction to provide opinion summarization. In feature extraction phase, we need to perform part-of-speech tagging to identify nouns/noun phrases from the reviews that can be product features. Nouns and noun phrases are most likely to be product tagging is important as it allow us to generate general language patterns. We use Stanford-POS tagger to parse each sentence and yield the part-of-speech tag of each word whether the word is a noun, adjective, verb, adverb, etc. and identify simple noun and verb groups syntactic chunking, for instance,The_DT photo_JJ quality_NN is_VBZ amazing_JJ and_CC i_FW know_VBP i_FW m_VBP going_VBG to_TO have_VB fun_NN with_IN all_PDT the_DT POS tagging is done, we need to extract features that are nouns or noun phrases using the pattern knowledge see Table 1. And then, we focus on identifying domain product features that are talked about by customers by using the manually tagged training corpus for domain opinion words extraction, we used extracted features that are used to find the nearest opinion words with adjective/adverb. To decide the opinion orientation of each sentence, we need to perform three subtasks. First, a set of opinion words adjectives, as they are normally used to express opinions is identified. If an adjective appears near a product feature in a sentence, then it is regarded as an opinion word. We can extract opinion words from the review using the extracted features, for instance;The strap is horrible and gets in the way of parts the camera you need access nearly 800 pictures I have found that this camera takes incredible comes with a rechargeable battery that does not seem to last all that long, especially if you use the flash a the first sentence, the feature, strap, is near the opinion word horrible. And in the second example, feature “picture” is close to the opinion word incredible. We found that opinion words/phrases are mainly adjective/adverb that is used to qualify product features with nouns/noun phrases. In this case, we can extract the nearby adjective as opinion word if the sentences contain any features. However, for the third sentence, the feature, battery, cannot be able to extract nearby adjective to meet the opinion word “long.” The nearby adjective “rechargeable” dose not bear opinion for the feature “battery.”Moreover, both adjective and adverb are good indicators of subjectivity and opinions. Therefore, we need to extract phrases containing adjective, adverb, verb, and noun that imply opinion. We also consider some verbs like, recommend, prefer, appreciate, dislike, and love as opinion words. Some adverbs like not, always, really, never, overall, absolutely, highly, and well are also considered. Therefore, we extract two or three consecutive words from the POS-tagged review if their tag conforms to any of the patterns. We collect all opinionated phrases of mostly 2/3 words like adjective, noun, adjective, noun, noun, adverb, adjective, adverb, adjective, noun, verb, noun, and so forth from the processed POS-tagged resulting patterns are used to match and identify opinion phrases for new reviews after the POS tagging. However, there are more likely opinion words/phrases in the sentence but they are not extracted by any patterns. From these extracted patterns, most of adjectives or adverbs imply opinion for the nearest nouns/noun phrases. Table 2 described some examples of opinion Dataset of the SystemWe used annotated customer reviews’ data set of 5 products for testing. All the reviews are from commercial web sites such as and Each review consists of review title and detail of review text. The reviews are retagged manually based on our own feature list. Each camera review sentence is attached with the mentioned features and their associated opinion words. Therefore, we only focus on the review sentences that contain opinions for product features, for instance, “The pictures are absolutely amazing—the camera captures the minutest of details.” This sentence will receive the tag picture [+3]. Words in the brackets are those we found to be associated with the corresponding opinion orientation of feature whether positive or negative see Table 3. ExperimentsWe carried out the experiments using customer reviews of 5 electronic products two digital cameras, one DVD player, one MP3 player, and one cellular phone. All the reviews are extracted from All of them are used as the training data to mine patterns. These patterns are then used to extract product features from test reviews of these products. We now evaluate the proposed automatic technique to see how effective it is in identifying product features and opinions from customer reviews. In this paper, we only verify only product features but we make sentiment orientation of opinion on that features as an ongoing process. The effectiveness of the proposed system has been verified with review set on these five different electronic products. All the results generated by our system are compared with the manually tagged results. We also assess the time saved by semiautomatic tagging over manual tagging. We showed the comparison results with Hu and Liu’s approach and our approach is slightly higher than their results in Table ConclusionMost of opinion mining researches use a number of techniques for mining opinion and summarizing opinions based on features in product reviews based on data mining and natural language processing methods. Review text is unstructured and only a portion or some sentences include opinion-oriented words. In product reviews, users write comments about features of products to describe their views according to their experience and observations. The first step of opinion mining in classifying reviews’ documents is extracting features and opinion words. Therefore opinion mining system needs only the required sentences to be processed to get knowledge efficiently and effectively. We proposed the ideas to extract patterns of features and/or opinion phrases. We showed results of experiments with extracting pattern knowledge based on linguistic rule. We expected to achieve good results by extracting features and opinion-oriented words from review text with help of adjectives, adverbs, nouns, and verbs. We believe that there is rich potential for future research. For identifying feature, we need to extend both explicit and implicit feature as our future work because both of these features are useful for providing more accurate results in determining the polarity of product/feature before summarizing them, rather than explicit feature Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD '04, pp. 168–177, August at Google ScholarB. Liu, M. Hu, and J. Cheng, “Opinion observer analyzing and comparing opinions on the web,” in Proceedings of the International World Wide Web Conference Committee IW3C2 '05, pp. 10–14, Chiba, Japan, May at Google ScholarM. Hu and B. Liu, “Mining opinion features in customer reviews,” in Proceedings of the 19th International Conference on Artifical Intelligence AAAI '04, pp. 755–760, at Google ScholarQ. Su, X. Xu, H. Guo et al., “Hidden sentiment association in Chinese web opinion mining,” in Proceedings of the 17th International Conference on World Wide Web WWW '08, pp. 959–968, April at Publisher Site Google ScholarG. Somprasertsri and P. Lalitrojwong, “Mining feature-opinion in online customer reviews for opinion summarization,” Journal of Universal Computer Science, vol. 16, no. 6, pp. 938–955, at Google ScholarP. D. Turney, “Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL '02, at Google ScholarY. Wu, Q. Zhang, X. Huang, and L. Wu, “Phrase dependency parsing for opinion mining,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP '09, pp. 1533–1541, August at Google Wong and W. Lam, “Hot item mining and summarization from multiple auction Web sites,” in Proceedings of the 5th IEEE International Conference on Data Mining ICDM '05, pp. 797–800, Houston, Tex, USA, November at Publisher Site Google Wong and W. Lam, “Learning to extract and summarize hot item features from multiple auction web sites,” Knowledge and Information Systems, vol. 14, no. 2, pp. 143–160, at Publisher Site Google ScholarB. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP '12, pp. 79–86, Association for Computational Linguistics, Philadelphia, Pa, USA, July at Google ScholarR. Mras and J. Carroll, A comparison of machine learning techniques applied to sentiment classification [ thesis], University of Sussex, Brighton, UK, Gamon, “Sentiment classification on customer feedback data,” in Proceedings of the 20th international conference on Computational Linguistics, p. 841, Association for Computational Linguistics, Morristown, NJ, USA, at Google ScholarL. Zhang and B. Liu, “Identifying noun product features that imply opinions,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics Human Language Technologies ACL-HLT '11, pp. 575–580, June at Google ScholarB. Liu, “Sentiment Analysis and Subjectivity,” in A Chapter in Handbook of Natural Language Processing, 2nd at Google ScholarCopyright © 2013 Su Su Htay and Khin Thidar Lynn. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

language features of review text