TOPICS IN DATA SCIENCECP-8210 FINAL REPORTDATAMINING Submitted to :-AbdolrezaAbhari Submitted by :- GurpreetSinghStudent Number:-500802475 DATE 01/01/2018 Introduction Digginginformation from the pool of data is termed as data mining.
There is humungousdata available in the information industry that is useless unless convertedinto beneficial information and analyzed to discover any fraudulence, buyer’schoice, to control the manufacturing of products and understand the marketbetter.Datamining helps the entrepreneurs to know their customers better in a way of theirchoices, the deals for their money, their income and criteria by which theylike to spend. It also gives an idea how often a customer likes to spend andmakes one capable to relate different people with similar choices. Apartfrom these it also assists in cooperate sector. Datamining is categorized as “Descriptive and Classification and production” on thebasis of the type of the data.
1. Descriptive function It describes the basic feature of information in database such as:-Class/concept description-Mining of frequent patterns-Mining of association-Mining of correction-Mining of clustersCLASS/CONCEPT DESCRIPTIONClass- The products to be sold by the company, for example, clothes.Concept- The money being spent by the customer, shoppers or the ones who buy inbudget.They can be gathered intwo ways:- Data Characterization: Review the data of the class to be studied namely the ‘Targetclass’- Data Discrimination: Comparison of the class with a designated class.
MINING OF FREQUENT PATTERNSThe products (patterns) that usually are seen in transactional data aretermed as frequent patterns.- Frequent item set: The products that are enlisted with one another such astop and bottom wear in clothing section.- Frequent sub sequence: The products that are generally bought with the mainitem such as buying pet food followed by pet treats.-Frequent sub structure: Graphs, trees or various other structural forms thatare attached to sub sequences.MINING OF ASSOCIATIONThe item that are generally bought together are included in this category. Withthe help of this a businessman discovers a percentage of association betweenproducts bought together such as 60 percent of times a mobile phone is boughtwith a mobile cover and 40 percent of times with screen guards.MINING OF CORRELATION It reveals the effect of purchase of one product over another whether it has anegative, positive or no effect at all. MINING OF CLUSTERSIt is grouping the like similar products from one another.
Each clustervaries from the other.2. Classification andpredictionTheclass label of some items may be unknown. Classification and prediction is onesuch procedure that can be utilized to uncover the data class or concepts.This procedure is presented as: (a) Classification (If-Then) rules(b) Decision trees(c) Mathematical formulae(d) Neural networks FUNCTIONS: -Classification: Derivingmodel that differentiates the class or concept of the information. This modelis based on the object with a well known class label.- Prediction: Regression analysis is brought to practice to predict thenumerical values that are unknown rather than the class label.
Also it is usedto identify sale trends on the basis of data available.-Outlier analysis: The data that does not abide by the model of data availableis an outlier data.-Evolution analysis: Itrefers to those subjects which are transitional in nature.HOW DOES THE CLASSIFICATION WORK? Itincorporates two stages: -Building the classifier or model- Using classifier for classification BUILDING THE CLASSIFIER-It is alearning step-ordercalculations assemble the classifier-set made fromdatabase tuples and related class labels-each type iscalled as classification or class are known as test/question or informationpoints. USING THE CLASSIFIERClassifier is utilized for arrangements that includeanalyzing the relevance and exactness of characterization rules and thuslinking the older and new information tuples if considered adequate. DATA MINING TASKPRIMITIVES A data mining exercise can be specified as a query.-Transfer the query to the computer.
-This query is hence derived as data mining task primitive.-Therefore, the primitive develop an interactivecommunication with data mining system. This process is undertaken with following requirements: #Mine the appropriate data:Part of database that is of user’s interest.It is composed of:-database attributes and data warehouse dimensions ofinterest. #Nature of information for mining processIt advices the functions to be undertaken which are: -characterization -discrimination -associationand correlation analysis -classification -prediction -clustering -outlieranalysis -evolution analysis #Stored knowledgeIt permits the mining of information at multifarious levelsof contemplation.E.g.
the concept of hierarchies. #Effectiveness measures and outset for evaluation for thepatternsThe patterns discovered through stored knowledge areappraised. #Presentation toanticipate the uncovered patternsIt alludes to the visualization of discovered patterns by themeans of rules, tables, charts, decision trees, graphs etc.
ISSUES IN DATA MININGData aggregation can be complicated due unavailability ofinformation all at a single place. It creates a need to be collected fromvaried sources. The major points of concern are:(i)Mining methodology and user interest(ii)Performance issues(iii)Diverse data type issues The following diagram shows issues in data mining: DATA WAREHOUSEIn order to back the discussion of management followingfeatures are exhibited:Subject orientedSince the information related to subject that could be sales,customer, product etc, so data warehouse is considered as subject oriented. In addition,it does not consider the prevalent operation but the analysis of data fordecision making.Integrated Since the data iscollected from variable sources, it makes it reliable for studying the data.
Time variantThe data is recognized in relation to the past view points.Non volatileData warehouse is kept aloof the operational database. So anynew information does not delete or replace the previously stored information. Data warehousing is composed of data cleaning, integrationand consolidation and is followed through two approaches i.
e. query driven andupdate driven viz a viz the former builds the wrappers and integrations alsocalled mediators and the latter makes the data available for direct query. Updatedriven approach is today’s approach. APPLICATIONData mining is used in:· Retail industry· Telecommunication· Financial data analysis· Intrusion detection· Biological data analysis· Other scientific applicationsData mining in banking/financeInfinancial arena data mining is reliable to predict payment of the loans andanalysis of the credit policy and detect any fraudulence.Data mining in marketingSimilarlyin retail industry it helps in better understanding of customers, products,sales, etc.Data mining in healthcare Ithelps preserve a large data as in bioinformatics that enables study in variousbiological aspects such as genomics, proteomics and biomedical research.
TRENDS IN DATA MININGThereis a constant evolution of concept in data mining such as follows:· Visualization· Exploring the application· Web mining· Biological mining· Privacy protection· Distributed data mining References:-1. https://www.tutorialspoint.comData Mining: Practical Machine Learning Tools andTechniques, Elsevier Science, 2011