AbstractScene classification is an important and elementary problem in image understanding. It deals with large numberof scenes in order to discover the common structure shared by all the scenes in a class. It is used in medicalscience (X-Ray, ECG and Endoscopy etc), criminal detection, gender classification, skin classification, facialimage classification, generating weather information from satellite image; identify vegetation types,anthropogenic structures, mineral resources, or transient changes in any of these properties. In this paper, at firstwe propose a feature extraction method named LHOG or Localized HOG.

We consider that an image containssome important region which helps to find similarity with same class of images. We generate local informationfrom an image via our proposed LHOG method. Then by combing all the local information we generate theglobal descriptor using Bag of Feature (BoF) method which is finally used to represent and classify an imageaccurately and efficiently.

In classification purpose, we use Support Vector Machine (SVM) that analyze dataand recognize patterns. The basic SVM takes a set of input data and predicts, for each given input, which of twopossible classes forms the output. In our paper, we use six different classes of images.Keywords: LHOG; Localized HOG; BoF; Scene Classification; Corner Detection.

————————————————————————* Corresponding author.13International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-281. IntroductionA scene refers to the place where an action or event occurs. It is different from object or texture which dependson the distance between the observer and the target point. If the distance is low that means high coverage ofpoint, it is called an object but when the distance increases, the fixed point goes to large scale and it is known asscene. Images of computer, monitor, human, bus, truck etc are objects.

On the other hand, an image of footballfield, cricket field, bus terminal, horizon, river, mountain, forest, full image of a train etc are known as scenes.Scene classification is a problem and great interest on researcher. Scene images have large varieties. A scenemay vary on scale, rotation, illumination another variation on two dimensional (2D) and three dimensional (3D).Existing features that are use for scene classification are base on only color, shape, texture and other visualparts of the image. Most of them are single descriptor.

Those descriptors are single feature based and cannotshow high accuracy and effectiveness. So, we have propose a new approach which at first chooses someinteresting parts of an image that helps to find similarity between same class of images and also help to differfrom other classes of image. We propose a method for selecting interest in an image which is used to decidelocally important patches. After selecting the points we analyze the surrounding area of that point and apply amethod to generate Localized feature which we name as LHOG feature. Then we convert all local features,LHOG, to global feature and thus get a final descriptor of an image. Then we apply Support Vector Machine(SVM) to train itself and then classify the descriptor from a test image. We have found a global descriptor thatmeans global feature from local features (LHOG feature) using Bag-of-features (BoF) technique 7. As thesame way, finally we get different global descriptors.

Our method makes huge variety for different classes ofdata set as example of our sample data set shown in Figure 1. (a) (b)Figure 1: Sample image (a) CU road (b) Zero pointHOG 6 is based on evaluating well-normalized local histograms of image gradient orientations in a dense grid.The basic idea is that local object appearance and shape can often be characterized rather well by thedistribution of local intensity gradients or edge directions, even without precise knowledge of the correspondinggradient or edge positions. In 16 17 a method was developed for distinctive, scale and rotation invariant14International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28features of images that can be use to perform matching between different views of an object or scene.

Agenerative model from the statistical text literature here applied to a bag of visual words representation for eachimage, and subsequently, training a multi way classifier on the topic distribution vector for each image 18.Shape and appearance based image classification that shows accuracy rate 15 but our proposed approach showbetter result than others. Applying our method we see that for Figure 1(a). there are 305 corners and therecorresponding LHOG features by comparing all of these LHOG feature global descriptor is generated that is47,51,29,11,21,48,5,16,51,26 and for Figure 1(b).

there are 291 corners gives a final global descriptor15,27,44,4,17,69,24,26,40,26. It shows that a global descriptor using LHOG value there is huge differencebetween Figure 1 (a) and (b). For this reason we have achieved a good accuracy and it gives faster result.2.

Proposed MethodOur method consists of the stages key point detection, feature extraction, global mapping and classification. Inthe key point detection stage, we are concerned about localizing the highly informative patches. These points aredetected following the sequence of edge detection, curve extraction and finding cornerness. Around eachinterest point a rectangular patch is analyzed to get statistical attributes which aims to produce local features atthat area. These local features are mapped to a multi-dimensional space in order to generate a global signature ofa scene. In our approach we use state of art Canny edge detection technology which is followed by curvaturescale space corner detector 1 method for measuring cornerness. Corner distribution in every local patch isanalyzed by constructing a normalized histogram.

This histogram gives the logical feature in our method. Alllogical features throughout of a given scene image are fed into the bag-of-features aiming to generate the globalsignature of this scene. We use support vector machine (SVM) as our classification system. Localized HOG orLHOG is a feature descriptor use in computer vision and image processing for the purpose of object detectionand scene classification. The technique counts occurrences of gradient orientation in localized portions of animage.

It is computed on a dense grid of uniformly spaced cells and uses overlapping local contrastnormalization for improved accuracy. Figure 2 depicts our proposed approach. The following sections illustratethe sequence of stages in our proposed method.3. Interest pointInterest points or corners are very vital part of an image processing technique. Interest points are located usingthe following step by step procedures.3.

1. Edge DetectionEdge consists of a meaningful feature and contains significant information of an image. The edge detectionprocess serves to simplify the analysis of images by drastically reducing the amount of data to be processed,while at the same time preserving useful structural information about object boundaries. In our method, we usedCanny detection method. Steps of Canny edge detection method 910 as follows:a) Image Smoothing with Guassian image smoothing based on this equation15International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28Figure 2: Overall design of scene classification.

g(m,n)= G (m,n) f (m,n) ? ? (1) where ??? ???? 222?exp21 22 ?m +n??G = and ( m , n ) is the pixel coordinate.After applying Gaussian smoothing we find the image shown in Figure 3.Figure 3: Smoothing image after Gaussiaan filter.b) Compute the gradient magnitude from x, y partial derivatives Tx yTS S SySxS = ??? ??????? ? = (2) This is the derivatives of pixel (x,y)Test ImagesTraining Images Interest PointGlobal DescriptorHOGGlobal Descriptor Bag-of-FeatureSVM Scene ClassLHOG16International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28 Gradient magnitude and orientation are as follows2 2 ?S = Sx + Sy (3) xySS1 tan ? ? =c) Apply non-maxima suppression to gradient magnitude for thinning image to eliminate non-importantedge point. Suppress the pixels in gradient which are not local maxima.

( ) ( ) ( ) ( )( ) ( ) ?????? > ? ?? ??? > ? ? ? ? =0 otherwise& , ,if , , , , S x y S x yS x y S x y S x y G x y (4 )Where (x?,y?) and (x??,y??) are the neighbors of (x,y)in ?S along the direction normal to an edge After applying these steps on our sample image then we get an edge map shown in Figure 4.Figure 4: Edge-map3.2. Curve Extraction and Corner DetectionCurvature is the amount by which a geometric object deviates from being flat, or straight in the case of a line,but this is defined in different ways depending on the context. Let the equation for curvature K is( ) ( ) ( ) ( ) ( ) ( ) ( ) 3 / 2 2 2 X u,? +Y u,?X u,? Y u,? X u,? Y u,? K u,? = ? ?? ?? ? ?? ? (5)where X (u,?)= x(u) g?(u,?) ? ? , X (u,?)= x(u) g??(u,?) ?? ? ,Y(u,?)= y(u) g?(u,?) ? ? ,Y(u,?)= y(u) g??(u,?) ?? ? , and ? is a convolution operator, while g(u,?)denotes a Gaussian of Width? and g?(u,?)and g??(u,?)are the first and second derivatives of g(u,?)respectively 1. After curve17International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28extraction we get a output shown in Figure 5.Figure 5: Extracted curve.

Now list of corner candidates are { }jNj j j A = P1, P2,…

..P where { }jijiji P = x , y are pixels on the contour. And Nis the number of pixels on the contour.Now it is either close or open j A is closed if |P P |T jNj1 usually T is 2 or 3.

The contour convolved with the Gaussian smoothing kernel g is denoted by A = A g j jsmooth ? where g is adigital Gaussian function with width controlled by ? now the curvature value of each pixel value is computedby ( ) ( ) 3 / 2 2 22 2jijijijijijj ii?x + ?y?x ? y ? x ?y K = ? for i=1,2,3,……

.. . .. . , N (6) where ( )/ 2 1 1jiji+ji ?x = x x ? ? , ( )/ 2 1 1jiji+ji ?y = y y ? ? and( )/ 2 2 ji 1ji+1j ? xi = ?x ?x ? ? , ( )/ 2 2 ji 1ji+1j ? yi = ?y ?y ? ? and all the local maximum and curvature functionare included in the initial list of corner candidates.

But there may be some rounded corners that’s needed toremove. We can remove it by adaptive threshold methods 2.( ) ?| ( )| ?× ×12 i=u Lu+ LK iL + L +T u = R K = R2 1 11(7)where u is the position of the corner candidate and L1+L2 is the position of the ROS centre at u and R is acoefficient. After applying curvature extraction, round corner and false corner removing 1, then we get ourdesire interest points as shown Figure 6.4. Feature ExtractionIn a scene image, we can observe that the most of the area belonging to this scene is flat. Generally a flat areadoes not contain enough clues to represent the image in a discriminative way. Rather the textured area is very18International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28good at representing the scene contents.

Considering this in our mind, we try to select a patch around the cornerpoints which are treated as interest point in the previous section. The selected patch area is shown in Figure 7.Figure 6: Detected corner. The small rectangle denotes the small patch around the corner.Figure7: Patch area.4.1 Localized FeatureIn this method we consider that every part as an interesting point that is a local representation of that point.

Ineach scene, we separate an interesting point by corners and there corresponding HOG 3 6 value around thecorners. After corner detection generating HOG values that makes a Localized HOG or LHOG feature asfollows: LHOG feature = Interest point + Corresponding HOG of interest point (8)Our sample image there is 305 important corners are detected. For example a corner point (255, 31) is selectedand its patch area is 25 X 50 pixels. LHOG values of Figure 7 are shown in table 1.4.2 Global MappingGlobal mapping represents the over structure and distribution of local features.

To perform this we use bag-offeatures.BoF approaches are characterized by the use of an orderless collection of image features. Lacking anystructure or spatial information, it is perhaps surprising that this choice of image representation would be19International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28powerful enough to match or exceed state-of-the-art performance in many of the applications to which it has beenapplied 7. BoF generates global feature from all of the local features.

In our method all the LHOG value arecombined and after applying clustering method we generates a global descriptor. Bag-of-Feature takes thefollowing steps shown as Figure 8.Table 1: LHOG values for an interested pointPosition LHOG value1 0.

18222 0.62313 0.43374 0.2723. ..

.. .80 0.128481 0.1196The rest of corners generate 81 X 1 matrix of LHOG descriptor.

All of the LHOG values are considered as alocal descriptor.Figure 8: Stepwise Bag-of-Feature.First LHOG value is compared with all 10 cluster and find the minimum distance it goes to cluster 2 finally wegot a global descriptor based on 305 LHOG value. Global descriptor from LHOG using K means clustering 5.Finally we applied this method for all of images of same class and different 6 classes. Image representation bycodeword 4 using LHOG frequencies shown in Figure 9.20International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28Figure 9: Codewords from sample image.5.

ClassificationClassification is a model that receives data as input and predicts for a given input in which class it is. In case ofsupervised learning we have to provide a training data set with its group number. Then provides an input to testin which class its similar to that input. In our case we have used Support Vector Machine 8 supervised learningas a classifier. At first trains the SVM machine with all the images that are for training purpose actually it takesa descriptor set and image group number. During the time of classify it takes on a test image descriptor that ismatches with trained image set.

It returns a group number in which group it is more similar. It returns nothing ifit is not closely match with none of group.6. Experimental ResultIn our research there are 6 classes of data set and we applied stratified k-fold cross-validation 13. Results areshown in table 2 and accuracy graph in Figure 10.

Figure 10: Accuracy vs. Number of class21International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28Table 2: Accuracy results for 2 fold cross validation.Numberof ClassClasses Name Number ofImage perClassTotalNumber ofImageNumber ofImage CorrectlyIdentifiedNumber ofImage WrongClassifiedAccuracy2 Class CU Road, IT Building 42 84 83 1 98.

80%3 Class CU Road, IT Building,Freedom Sculpture42 126 121 5 96.03%4 Class CU Road, IT Building,Freedom Sculpture,Shah Jalal Hall42 168 156 12 92.85%5 Class CU Road, IT Building,Freedom Sculpture,Shah Jalal Hall, SaheedMinar42 210 190 20 90.48%6 Class CU Road, IT Building,Freedom Sculpture,Shah Jalal Hall, SaheedMinar, Zero Point42 252 216 36 85.71%Our dataset is self data set shown in Figure 11.(a) (b) (c)22International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28(d) (e) (f)Figure 11: Sample images of our data Set where (a) CU Road, (b) IT Building, (c) Freedom Sculpture, (d)Saheed Minar, (e) Shah Jalal Hall, (f) Zero Point.6.1.

Recall and Precision GraphIn pattern recognition precision 12 is the fraction of retrieved instances that are relevant, while recall is thefraction of relevant instances that are retrieved. Both precision and recall are therefore based on anunderstanding and measure of relevance. When referring to the performance of a classification model, we areinterested in the model’s ability to correctly predict or separate the classes. When looking at the errors made bya classification model, the confusion matrix gives the full picture. Considering three classes problem with A, B,and C class. A predictive model may result in the following confusion matrix when tested on independent data.The confusion matrix shows how the predictions are made by the model in table 3.

Table 3: Confusion matrix with notationPredicted classA B CKnown class(class label indata)A A tp AB e AC eB BA e B tp BC eC CA e CB e C tpi) Precision:Precision is a measure of the accuracy provided that a specific class has been predicted.It is defined by: Precision= tp /(tp+ fp) (9)23International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28where tp and fp are the numbers of true positive and false positive predictions for the considered class. In theconfusion matrix above, the precision for the class A would be calculated, PrecissionA = tp A /(tp A +eBA +eCA )= 25/(25+3+1) ? 0.86 (10)ii) Recall:Recall is true positive rate. It is defined by the formula: Recall = Sensitivity = tp /(tp+ fn) (11)where tp and fn are the numbers of true positive and false negative predictions for the considered class.tp+ fn is the total number of test examples of the considered class. For class A in the matrix above, the recallwould be: ( )25 /(25 5 2) ? 0.78/= + +Recall = Sensitivity = tp tp +e +e A A A A AB AC (12)Our experimental result of recall and precision in various classes are shown in Figure 12.

6.2. Receiver Operating Characteristics (ROC)ROC 11 12 curve is a useful technique for organizing classifiers and representing their performance.

It iscreated by plotting the fraction of true positive rate (TPR) vs. false positive rate (FPR). Let us define anexperiment from P positive instances and N negative instances.

The four outcomes can be formulated in a 2×2confusion matrix in table 4.Table 4: Confusion matrix for ROC curve The calculation of TPR and FPR are as follows: TPR = TP / P = TP /(TP+ FN) (13) FPR = (FP / N)= FP /(FP+TN) (14) Prediction outcomeActual valueP’ N’ TotalP True Positives False Negatives PN False Positives True Negatives NTotalP’ N’24International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28(a) (b)(c) (d) (e)Figure 12: Recall precision graph for (a) 2 class (b) 3 class (c) 4 class (d) 5 class (e) 6 class.Our experimental results of 2 fold ROC curve for different number of scene classes are as shown in Figure 13.7. Conclusion and Future WorkIn this paper we propose a novel scene classification method. our research we have achieved a goodperformance that on previous graph and its accuracy rate is high that is above 85 percent. Our database containsimages in variety of format on same class.25International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28(a) (b)(c) (d) (e)Figure 13: ROC curve for (a) 2 class (b) 3 class (c) 4 class (d) 5 class (e) 6 class.

26International Journal of Computer (IJC) (2016) Volume 20, No 1, pp 13-28Total time of classifying scene that includes input data to classified output is 31.8837s and it was for 84 imageshence per image computation time is 31.8837 /84= 0.3796 s where image resolution was 461× 365 pixels.This time is slower than other existing system of scene classification also our accuracy is higher than otherexisting systems of scene classification.

In future we will concentrate on increasing accuracy and reducing processing time. We will try to obtainprocessing time 0.05s per image so that our proposed method can work in security system such as criminaldetection from video.