Abstract-In radiotherapy using 18-fluorodeoxyglucosepositron emission tomography (18F-FDG-PET), the accurate delineation of thebiological tumour volume (BTV) is a crucial step. In this study, new approachto segment the BTV in F-FDG-PET images is suggested. The technique is based onthe k-means clustering algorithm incorporating automatic optimal cluster numberestimation, using intrinsic positron emission tomography image information. Partitioningdata into a finite number of k homogenous and separate clusters (groups)without use of prior knowledge is carried out by some unsupervised partitioningalgorithm like the k-means clustering algorithm. To evaluate these resultantclusters for finding optimal number of clusters, properties such as clusterdensity, size, shape and separability are typically examined by some clustervalidation methods. Mainly the aim of clustering analysis is to find theoverall compactness of the clustering solution, for example variance withincluster should be a minimum and separation between the clusters should be amaximum.

I. INTRODUCTIONThereare several approaches to validate the segmentation techniques such as phantomstudies and the macroscopic surgical specimen obtained from histology. The useof macroscopic samples for validation of segmentation techniques in positronemission tomography (PET) images is one of the most promising approachesreported so far in clinical studies, the procedure consists of the comparisonof the tumour volumes defined on the PET data with actual tumour volumesmeasured on the macroscopic samples recorded from histology (where PET wasperformed prior to surgery). Segmentation using the cluster –based algorithmsis very popular, but the main problem in this case is the determination of theoptimal and desired number of clusters. In this, we have implemented anapproach based on k-means algorithm with an automatic estimation12 of theoptimal number of clusters, based on the maximum intensity ratio in a givenvolume of interest (VOI). II. METHEDOLOGY Calculatethe VE for a range of k (2–50clusters), and the optimal number which corresponds to the minimum of SVEs. This method givesgood results but consumes a significant computation time by performing theclustering for a large range of cluster values before selecting the optimalnumber of clusters.

Several approaches have been proposed in the literature toidentify the optimal cluster number to better fit the data, three of them areused.Unfortunately,the results are not promising because they are not adapted to PET imagesegmentation. So our goal in this study is to improve the k-means clusteringmethod, by incorporating an automatic determination of the optimal number ofclusters using a new criterion based on PET image features.Afteranalysing the variation of the maximum activity (intensity) of the uptake ( ) inthe VOI by scanning all slices, we conclude for all patients that the maximumintensity value decreases from middle to frontier slices, and the maximumintensity is often situated almost at the centre of the BTV. The optimalcluster number has a minimum value at the centre of the BTV, and increases fromcentral to frontier slices. This correlation between the optimal number ofclusters and the maximum intensity motivates our choice of the following sliceimage feature: Where is the maximum activity(intensity) of the uptake in the corresponding slice, is the maximum activityin all slices that encompasses tumour volume BTV inside the , and is the difference between the maximum and theminimum values of (Imax (slice)/Imax (VOI)) in the .Similarlyto , the new criterion , has a maximum value for the middleslices and decreases for the frontiers of the BTV.

Note that the r valuesrange from ‘0’ to ‘1’ for all patients. 1.Modelling: This section is dedicated to finding arelationship between the optimal number of clusters k, and the new criterion r.This relationship could be used to determine the optimal cluster number for thesegmentation of new PET images using only the new slice image feature r.After analysing the variation of k in function of r criterion(for all patients included in this study), we use two fitting models: anexponential and a power function given by (a) and (b), respectively,k= ? ? e?. r + 1 (a)k= a ? + 1 (b)Where?, ?, a, b are coefficients of fitting models and ris the proposed criterion. The fitting accuracy evaluation is based on the R-squarecriterion.

Note that we added ‘1’ to the original fitting equation to avoidclustering the image with one cluster for the high values of r.2.Generalisation: The aim of this step is to automate thechoice of the optimal cluster number for all patients using one correspondingrelationship function by defining a generalised model for all patients.

Forthis reason, we have divided the database randomly into two parts of 50% each.The first part (validation set), contains three patients is used for optimisingthe model coefficients and fixing the optimal power and exponential generalisedmodel. The second part (test set), contains four patients, is used for testingthe accuracy of the fixed optimal model. Accordingto the R-square criterion, the optimal exponential and power generalisedfunction can be rewritten as follows: k= 46.

52e?5.918 × r+ 1 k= 1.683r?1.264 + 1 kmean alogorithm 3 : Fig1. K-meanclustering algorithm 3.

Optimization by new technique: Theunderlying idea behind it is an analysis of the “movement” of objects betweenclusters, considered either forward from k to k+1 or backwards from k+1 to kgroups. In other words we find the movement in membership or joint probabilityaround k groups. The joint probability obtained from adjacent consecutive knumbers of groups will be used to produce a diagonally dominant probabilitymatrix for optimal value of k homogenous and separable groups. The maximumnormalised value of the trace as the greatest value for k within the rangetested, will be defined as the optimal value of k clusters for the givendataset. Formally, we may describe our approach as follows. For a given choiceof k = number of clusters, a given choice of clustering technique U, and agiven choice of V = set of parameters v1..

.vn used to control the clusteringtechnique, we first construct a set of clusters (U,V) = { } with i=1..k. Next, we construct setsof clusters (U,V), and (U,V) using the same clusteringtechnique.

In the work reported here, we will not vary U and V so we may writethese cluster sets more simply as , and . Now these three ( , and ) consecutive groups around k will beused to find the proportion of common objects from Ck to , and , to create a rectangular proportionmatrix of size m x n, where m and n correspond to rows and columns ofproportion matrix . We denote the proportion of dataelements in common between a particular pair of clusters, say cluster from and cluster from by , which can be abbreviated to . Similarly, we can compute , k to create a rectangular matrix ofsize n x m where n correspond to columns and m to rows. Note that in general is not equal to as they have different cardinalitymeaning k+1 not equal to k and vice versa.

To investigate how much movement ofobjects occurs from Ck to Ck+1 and from to we will apply the dot product of matrix forsize m x n ( to ) and n x m rectangular matrices ( to ) to get the joint information in square matrix m x m for clusters. Due to the row sum constant of1 the resultant square matrix is also known as a rowstochastic matrix 4 or probability and transition matrix 5. The trace ofresultant set of clusters will be normalised (average ofthe trace) to determine the set of more stable (optimal k) clusters as if the normalised trace ismaximum and may change occur in the depending on the dataset for the rangeof adjacent set of k values. This matrix will be used to determine themaximum normalised trace value for determining set of more stable or consistentclusters , that will indicate set of clusters in are stable and completely separated fromone another. The steps involves in determining the optimal value of k from theresultant clusters are follows as: 1.Create the m x n forward proportion matrix from k to k+1 = (1) 2.

Create the n x m backward proportion matrix from k+1 to k = (2) Where =1,2,3 … . . and =1,2,3 … . . Thedot product of (1) by (2) will give us a m x m matrix as in (3) below with theentries showing the joint probabilities of the forward/back movement of theobjects between the set of clusters from k to k+1 and k+1 to k. = * (3) Thenew index can be calculated from in (3) as follows: = (4) From (4) the normalised maximum trace valuefor will indicate the set of stable cluster at kvalue. In an extreme situation the normalised trace is equal to 1, that iswhere the set of clusters in will keep splitting until the value ofthe trace is 1 and it may decrease or increase as we continue but will alwaysstay less than 1.

III. CONCLUSION A new unsupervised cluster-based approach for segmenting the BTV in F-FDG-PET images is introduced. The system is morereliable and has very less error. It can be improved by technique used in determining an optimal value of K in K-means clustering, for which k-means clustering it uses amethod to find an optimal value of k number of clusters, using the features andvariables inherited from datasets. The new proposed method is based oncomparison of movement of objects forward/back from k to k+1 and k+1 to k setof clusters to find the joint probability, which is different from the othermethods and indexes that are based on the distance.