Swathi YadavPG studentThakur College of [email protected]  Shwetha YadavPG studentThakur College of [email protected]  ABSTRACTThis survey paper is to classify the vast amount ofVoot Application reviews present on google play store is carried out by usingtext mining. Vocabularies including nice, best, good, well, satisfactory can beclassified into the good reviews for this application.

And those vocabulariesincluding bad, worst, stupid, slow, time consuming can be classified into thebad reviews. As we classify the reviews into good and bad, the more amount ofbad reviews will redirect us that the application needs further improvement.The objective of this paper is to classify the reviews into good and bad. Thispaper outlined a structured approach of text analysis and for classifying thereviews we will use classification algorithm. KeywordsClassificationalgorithm, Google playstore, Machine learning, Reviews, Support VectorMachine(SVM), Text data mining, Voot Application, 1.    INTRODUCTIONInthe previous decades the PC equipment innovation has turned out to be capable.This has supported up the database and data industry. Thus a substantial numberof databases and data vaults are accessible and the associations put away a lotof information.

This has expanded the requirement for capable informationinvestigation which is unrealistic without intense instruments. Informationmining devices dissect information from alternate points of view and condensethe outcomes as valuable data. They are utilized to work on a lot ofinformation to discover covered up examples and affiliations that can be usefulin choice making.Resent  investigation on data mining on observing thereviews on playstore or any online networking goes for learning and exampleextraction from enormous gathered database is expanding. In addition miningsuch data is confusing. The information course of action and retrival of suchcontent parts ends up plainly troublesome in light of the fact that they arefrequently portrayed in a free format. As of late, interesthas increased in text mining since it reveals valuable learning covered in alot of aggregated documents.Research has begun to apply text mining in numerousregions.

For instance, mining text in drug for breaking down patient history,Mining text in gathering reviews. Comingto our research, google play store is the most used application for downloadingapps. There is one benefit in google play store that we can see the reviewsbefore downloading any application. But this is challenging for the app ownerbecause, there will be bad reviews which can harm the reputation of anorganization. If they earn a bad reputation, its going to stick with themthroughout.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
Writers Experience
Recommended Service
From $13.90 per page
4,6 / 5
Writers Experience
From $20.00 per page
4,5 / 5
Writers Experience
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

That’s why, the businesses whether the organization is large orsmall they are worried about their digital footprint. So we are, retrieving thereviews of the Voot application using text mining which is present in thegoogle play store. And after that  usingmachine learning algorithm the classification of that reviews is performed. Togroup the immense measure of reviews that are in google play store for voot application , text mining is doneutilizing SVM algorithm 2.

VOOT APPLICATIONVootis the free streaming services featuring TV shows produced by Viacom’s Indianchannels. What it lacks in international content it makes up for with numerousshows and movies in regional languages. The kids section offers optionalparental controls.

It was created for the India’s favourite reality TV showsincluding Bigg Boss, Splitsvilla, Roadies and more.3.    TEXT MININGAtext analysis issue generally comprises of three critical advances: parsing,search and retrieval, and text mining.

Parsing is the procedurethat takes unstructured content and forces a structure for assistinvestigation. The unstructured content could be a plain content record, aweblog, an Extensible Markup Language (XML) document, a HyperText Markup Language(HTML) document, or a Word report. Parsing deconstructs the given content andrenders it in a more organized manner for the resulting steps.Search and retrievalis the recognizable proof of the archives in a corpus that contain searchlists, for example, particular words, expressions, points, or substances likeindividuals or associations.

These search list are for the most part called keyterms. Search and retrieval began from the field of library science and ispresently utilized widely by web crawlers.Text mining, Text mining includes mining through a content record or asset to getsignificant organized data. This requires modern logical tools that proceduremessage so as to gather particular catchphrases or key information focuses fromwhat are considered generally crude or unstructured formats.In text mining,built frameworks utilize things like scientific categorizations and lexicalinvestigation to figure out what parts of a content report are important asmined information. Factual models are usually valuable, and frameworks maylikewise utilize heuristics, or algorithmic mystery, to endeavor to figure outwhich parts of a content are essential.

Other control frameworks incorporatelabeling and catchphrase examination, where tools search for particular formalpeople, places or things or different labels and key words to make sense ofwhat is being composed about. Text mining includes mining througha content record or asset to get significant organized data. This requiresmodern logical tools that procedure message so as to gather particularcatchphrases or key information focuses from what are considered generallycrude or unstructured formats.

In text mining, built frameworks utilize thingslike scientific categorizations and lexical investigation to figure out whatparts of a content report are important as mined information. Factual modelsare usually valuable, and frameworks may likewise utilize heuristics, oralgorithmic mystery, to endeavor to figure out which parts of a content areessential. Other control frameworks incorporate labeling and catchphraseexamination, where tools search for particular formal people, places or thingsor different labels and key words to make sense of what is being composedabout.  4.    LITERATURE REVIEW  Sr. No.

Paper Nmae Author Name Description 1. Social Media Mining To Analyse Students’ Learning Experience Ms. S. Aswini, Dr.

Ilango Krishnamoorthy We concentrated on students presents on comprehend issues and issues in their education experience. Substantial work load, absence of awareness of social activities, and restlessness are a few issues that students face as they experience circular activities. In light of these outcomes, we began to execute a multi-label classification algorithm to arrange posts reflecting students’ issues.

2 A review on text mining    Yu Zhang , Mengdong Chen ,  Lianzhong Liu This paper introduces the research status of text mining. Then several general models are described to know text mining in the overall perspective. At last we classify text mining work as text categorization, text clustering, association rule extraction and trend analysis according to applications. 3 Mining online reviews in Indonesia’s priority tourist destinations using sentiment analysis and text summarization approach    Puteri Prameswari ;  Zulkarnain ;  Isti Surjandari ;  Enrico Laoh   The main contribution of this research is to combine two techniques in text mining that have never been done before, namely the sentiment analysis and text summarization.  5.    PROPOSED METHODGoingto our exploration, google play store is the most utilized application fordownloading applications.

There is one advantage in google play store that wecan see the reviews previously downloading any application. Yet, this is tryingfor the application proprietor on the grounds that, there will be awful auditswhich can hurt the notoriety of an association. On the off chance that theyprocure a terrible notoriety, it will stay with them all through. That is thereason, the organizations whether the association is substantial or little theyare stressed over their computerized impression. So we are, retrieving thereviews of the Voot application utilizing text mining which is available in thegoogle play store.

Also, after that utilizing machine learning calculation theorder of that surveys is performed. To aggregate the colossal measure of auditsthat are in google play store for voot application ,text mining is done usingSVM algorithm.5.1.ARCHITECTURE                     Fig1: Proposed architecture5.1.1    GATHERINGRAW TEXTInthis step ,we are collecting the raw text from google play store for thereference to voot application.By keeping the application name as the keywordthe raw texts are gathered.

Generally these raw texts are present on googleplaystore and it can be easily retrived. The raw text may contain thekeyword  VOOT . In google play store,from where we download any application we get to see the reviews made by theusers. These reviews may or may not contain the keyword ‘voot’ but then to theyare considered as the raw text.

5.1.2    REPRESENTTEXTAfterthe previous step, we now has some crude content to begin with. ln this step,crude content is first changed with content standardization procedures, forexample, tokenization and case folding. Now after performing the abovetechniques the text we get is in more structured format.  Tokenizationis the task of isolating (additionally called tokenizing) words from the rawtext. Raw text is changed over into collection of tokens after thetokenization, where every token is a word. For eg if the text is gud, good willbe considered in the same token as good.

Case folding isthe technique in which all the upper case in a text are converteted into lowercases. But if the words like WHO, General Motors will be tokenized as who,general and motors due to this the meaning of the text would obviously change.So to avoid this issue look up table is generated where those texts are storedwhich shouldn’t be case folded. 5.1.3    SENTIMENTANALYSISAfterrepresenting text the main goal is to analyze the sentiments of the text.Sentiments are basically the emotions related to the contents. Emotions can bepositive or negative.

So to analyse whether the emotions are positive ornegative sentiment analysis is done. The review may contain positive value orthe negative value for the voot application. So by performing sentimentanalysis these values are categorized.5.1.4    SUPPORTVECTOR MACHINESupportvector machine is the concept of Machine learning which is used forclassification of instances.

SVM is used to analyse the instances and classifythose instances into their respective classes. One of the advantages of SVM isthat it allows miss-classification of some of the instances.Miss-classification is nothing but those instances which are wronglyclassified. For this problem the SVM introduce the concept of margin. Thesemiss-classified instances are support vectors. And with the help of thesesupport vectors margins are made.               Fig2:classification of linear separable instancIfthe training instances are linearly separable as shown in the above fig2 , thenthere can be multiple number of classifiers.       Fig3 : Multiple no.

of classifiers forlinearly separable   instances Ifsome of the instances belongs to the different class and if it is wronglyclassified then miss-classification of that instance is occurring. Thefollowing figure4 depicts the miss-classification of the positive and thenegative instance that are falling in different class.        Fig4: miss-classification of traininginstancesMiss-classificationof the instances will lead to problem so SVM is used because SVM allowsmiss-classificationas depicted in fig5.                    Fig5: Support vector machineSVMis also called as margin classifier because it takes help of margin to avoidmiss-classification.SVM reduces multiple number of binary classifiers byintroducing the margin concept.5.1.5 MATHEMATICAL MODELSVMhas three main components they are decision boundary , support vectors andmargin.

T=w.x  …..


(decision boundary)……(1)                               where T is thesome threshold according to which input instances are classified to class {+ve, -ve}Ti=w.xi-m…(for input instance xi classified to class -ve)..(2) where Ti

.(3) where Ti>T                              Fig 6: SVM geometry Case 1: If SVMclassifies the input instance xi to positive class then according tothe algorithm                 Ti=w.xi+mLetm=1 then Ti=w.xi+1, Ti-t=w.xi+1-w.

x=1      ….w.xi andw.x is almost sameIfactual value of Yi =+1, that means no errors therefore actual andpredicted class are same and therefore no miss-classification. Ifactual value of Yi = -1 that means error has occurred thereforeactual and predicted class is not same that means miss-classification hasoccurred.Case 2:   If SVM classifies the input instance xito negative class then according to the algorithm                 Ti=w.xi-mLetm=1 then Ti=w.xi-1, Ti-t=w.

xi-1-w.x=-1      ….w.xiand w.x is almost sameIfactual value of Yi =-1, that means no errors therefore actual andpredicted class are same and therefore no miss-classification.

Ifactual value of Yi = +1 that means error has occurred thereforeactual and predicted class is not same that means miss-classification hasoccurred.6.    CONCLUSION:Thispaper presents our approach towards mining the text from google playstore. Thereviews are collected from the playstore of VOOT app and then those reviews areanalyzed to find the usefull information from it. This was generally done bythe sentimental analysis approach.

By sentiment analysis we concluded thereviews into two categories as the good review and the bad review. Aftersentimental analysis we have used SVM(Support Vector Machine) to finallycalssify the n number of reviews into the appropriate class they should belong.This would ultimately help the app owner for get their ratings in theindustry.

     7.    REFERENCES1 E.Srimathi, K.A.

Apoorva “Preserving identity privacy ofhealthcare records in big data publishing using dynamic MR”, InternationalJournal of Advanced Research in Computer Science and Software Engineering, Vol5, Issue 4, 2015. 2  Xin Chen, Mihaela Vorvoreanu and KrishnaMadhavan, ” Mining social media data for understanding students learningexperience” , IEEE transaction on Learning Technologies, vol.7, no.3,Pp,16-22 July – September 2014 3 Kamal Nigam, Andrew Kachites Mccallum, SebastianThrun, Tom Mitchell, “Text Classification From Labeled And Unlabeled DocumentsUsing EM”, Machine Learning, 39, Kluwer Academic Publishers.

Printed In TheNetherlands, Pp. 103–134, 2000. 4 Bo Pang And Lillian Lee,”A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based On Minimum Cuts”,Morgan & Clay Pool Publishers, Pp. 54-58, 2008.

5 J. Han, M. Kamber, Data mining, Concepts and techniques,Academic Press, 2003.6 Tina R. Patil, Mrs. S.

S. Sherekar,”PerformanceAnalysis Of Naive Bayes And J48 Classification Algorithm For DataClassification”, J.sci.Education, Vol.86, No.1, Pp.7-15, 2000