Evolving Neural Network Using Variable String Genetic Algorithm for Color Infrared Aerial Image Classification

: Coastal wetlands are characterized by complex patterns both in their geomorphic and ecological features. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image. In this paper, we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image. With the VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally. Then, with backpropagation algo-rithm (BP), it can find the best connection weights. The VGA-BP classifier, which is derived from hybrid algorithms mentioned above, is demonstrated on CIR images classification effectively. Compared with standard classifiers, such as Bayes maximum-likelihood classifier, VGA classifier and BP-MLP (multi-layer perception) classifier, it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.


Introduction
Coastal wetlands are important resources, having many different roles in maintaining the land/sea interface system.They have a high rate of primary productivity which helps sustain estuarine and marine system, including commercially important species.Wetlands act as physical buffers against shoreline-damaging wave and wind energy and have a biochemical role of purifying water.They have cultural value, providing resources for research, education and aesthetics.Since 1986, a sub-tropical inter-tidal wetland in Moreton Bay, Queensland, Australia has been investigated systematically and using image subtraction and minimum message length principle, the salt-marsh vegetation classification has been researched (Dale et al., 1986;1996;2002).This paper investigates evolving neural network classifier for CIR (color infrared) Aerial Image classification of the wetland.So far, besides conventional classifiers such as Bayes maximum-likelihood classifier (Tou and Gonzalez, 1974), several recognition classifiers have been adopted.For example, the Fuzzy ARTMAP Clas-sifier (Carpenter et al., 1997;Filippi and Jensen, 2006), Genetic Classifier (Bandyopadhyay and Pal, 2001) and Neural Network Classifier (Benediktsson et al., 1990;Van Coillie et al., 2004).Within these methods, a neural network classifier with Evolutionary Algorithms (EAs) has been developed rapidly (Yao, 1999;Yao and Xu, 2006;Yan and Zhang, 2005;Frieke et al., 2007).
The most widely used neural network model is the multi-layer perception (MLP), in which the connection weight training can be implemented by using a backpropogation (BP) learning algorithm (Rumelhart et al., 1986).The BP learning approach is to start with an untrained network, presenting a training pattern to the input layer, passing the signals through the network and determining the output in the output layer.Here those outputs are compared with the target values, and any difference corresponds to an error.The error is minimized when the network outputs match the desired ones.Thus the weights are adjusted to reduce this measure of error (Duda et al., 2001).The essential character of the BP algorithm is gradient descent.The gradient descent algorithm is strictly dependent on the shape of the error surface, and the error surface may have some local minima, multimodal.This may result in falling into some local minimum and premature convergence may occur (Hertz et al., 1991).
On the other side, Genetic Algorithms (GAs) are randomized and optimized techniques guided by the principles of evolution and natural genetics.They are efficient, adaptive and robust in search process, producing near-optimal solutions and can handle large, highly complex and multimode spaces (Goldberg, 1989).Of course, it is not surprised to evolve the neural network with GAs (Van Rooij et al., 1996;Yao, 1999) and to study the remote sensing land cover classification using GAs evolving neural network (Liu et al., 2004).By searching a near-optimal set of initial connection weights of BP network with GA, it has showed that the hybrid GA-BP approach is more efficient than either GAs or BP algorithm used alone for multi-spectral image.
In general, the neural network model for image classification is a three-layer feed forward neural network.Its input nodes are equal to numbers of the multi-spectral bands (the dimensional numbers in feature space), and output nodes are equal to numbers of the image category, therefore its network architecture is dependent on numbers of the hidden nodes uniquely.The architecture design is crucial in the successful application of neural network, because the architecture has significant impact on a network's information processing capabilities (Maniezzo, 1994).
In this paper, a hybrid variable string genetic algorithm (VGA) and BP algorithm (VGA-BP) is proposed for evolving the three-layer neural network architecture and connection weights.With VGA, a near-optimal neural network topology (the number of hidden nodes) and a set of initial connection weights can be obtained simultaneously.And then using BP algorithm, the best connection weights at the local error surface is found.

Three-layer neural network
We assume that there is a three-layer neural network with m inputs (spectral bands), q outputs (categories), and l hidden nodes.We present a training pattern X to the ith input node, pass the signals to the jth hidden node through the connection weights W ji , and produce a hid-den output (Y j ) through a threshold function (sigmoid), then pass the signals to the kth output node through the connection weights V kj , and finally produce an output (O k ) through a threshold function (sigmoid or other function) (Fig. 1).We can formulate the relations of input and output as follows: ( ) where W ji is the connection weight between the hidden node j and the input node i, W j0 is its bias; the net j is the input of the hidden node j ; V kj is the connection weight between the hidden node j and the output node k, V k0 is its bias.net k is the input of the output node k.
The f(net) is a sigmoid activation function, it is defined as: where net∈[-∞, +∞], it is better to adopt an anti-symmetric function as: where a=1.716, b=0.666.For the a and b values given above, the f(net) is nearly linear in the range net∈ [-1,+1], it is beneficial to BP algorithm optimal learning.
Suppose we have a set of training patterns X={X 1 , X 2 , ..., X n }, where n is the number of training patterns, each training pattern X i in set X is an m-dimensional feature vector.Let T={T 1 , T 2 ,..., T n } as set X's corresponding output classes, T i ={t 1 , t 2 , ..., t q } is a q-dimensional class vector.If the target class for a specific pattern is k (1≤k≤q), then t k =1, otherwise, t k =0.Let's denote o ik as the ith actual neuron output for input training pattern X i at the output node k while t ik as its desired response.The mean square error function (MSE) for this neural network is defined as: where W represents all the weights in the network.Thus, this error is some scalar function of the weights and is minimized when the network output match the desired outputs.

VGA classifier
As well known, any object, such as a pixel in an image, can be represented as a feature point in a feature space.
The method used in VGA-classifier is to find decision boundaries for partitioning different classes points in a feature space.For example, in two-dimensional feature space, the decision boundaries are approximated by a set of lines, i.e. piecewise linear segments; in three-dimensional feature space, by a set of planes; furthermore, in m-dimensional feature space, by a set of hyperplanes.
With VGA algorithm, VGA classifier can search the appropriate number of hyperplanes and the optimal placement of hyperplanes in the feature space in order to obtain the best classification performance.

Hyperplanes for pattern classification
Every pixel in a multi-spectral image with m-dimensional bands can be represented as a feature point in m-dimensional feature space.Thus we can formulate the pattern classification by putting a set of hyperplanes in the feature space appropriately such that the number of misclassified training points is minimized.From elementary geometry, the equation of a hyperplane in m-dimension- where s 2 , …, s m ) represents a point in the feature space.α i is the angle that the projection of the normal in the (S 1 -S 2 -… S i+1 ) space makes with the S i+1 axis.α 0 is the angle that the projection of the normal in the (S 1 ) space makes with the S 1 axis, so α 0 =0.d is the perpendicular distance of the hyperplane from the origin.Thus, the (α 1 , α 2 ,…, α m-1 , d) specifies a hyperplane uniquely in m-dimensional space.

Description of VGA algorithm
(1) Hyperplane encoding.In VGA algorithm, a set of hyperplanes can be encoded by a binary string (chromosome).The chromosome is represented by , , , , , , , ) where Str is binary string which represents a set of hyperplanes, α j i is the α j angle of the ith hyperplane, d i is its perpendicular distance, h is the number of hyperplanes.If each angle variable α is represented by b 1 number of bits and a perpendicular distance d is represented by b 2 number of bits, then the string length is [(m-1)⋅b 1 +b 2 ]⋅h.For special classification problem, m, b 1 , b 2 are constant, so the string length is dependent on the number of hyperplane h.
(2) Misclassified points.When partitioning an mdimensional feature space with a set of hyperplanes, each hyperplane provides two half-spaces-a positive half-space and a negative half-space.For h hyperplanes, the maximum number (M) of such regions is 2 h .Suppose there are some feature points in the ith region, and the class corresponding to the most points in the ith region is j(1≤j≤q), then we consider that the ith region belongs to class j, and all other points in the ith region are misclassified points miss i .If all misclassified points miss i are summed up for entire regions (2 h ), then we obtain the total misclassified points miss.
(3) Fitness computation.The fitness function (f i ) is defined as: where n is the number of training patterns, h max is the maximum number of hyperplanes.h i is the hyperplane number of the ith string (chromosome).A string with zero hyperplane is defined to have zero fitness.Maximization of the fitness ensures the minimization of the umber of misclassified points and the number of hyperplanes.
(4) Genetic operators.Selection: The roulette wheel selection procedure has been adopted to implement a proportional selection strategy (Holland, 1975;Goldberg, 1989) Crossover: Two strings, i and j, having lengths l i and l j respectively, are selected from the mating pool.Let l i ≤ l j , then string i is padded with # so as to make the two lengths equal.Single point crossover is performed over those two strings with probability µ c .The following two cases may now arise: 1) all the hyperplanes in the offspring are complete (all the bits corresponding to it are either defined (i.e., 0 s and 1 s ) or # s ); 2) some hyperplanes are incomplete.In the second case, the incomplete hyperplanes must be modified as follows: Let u is the number of defined bits (either 0 or 1) and t is the total number of bits per hyperplane.Then, for each incomplete hyperplane, all the # s are set to defined bits (either 0 or 1 randomly) with probability u/t.Otherwise, all the defined bits are set to # with a probability (1-u/t).Do modification mentioned above repeatedly until all incomplete hyperplanes become complete.
Mutation: In order to introduce greater flexibility in the method, the mutation operator is defined in such a way that it can both increase and decrease length.For this, the strings are padded with # such that the resultant length equal to l max which is max length of chromosome.Now for each defined bit position, its reverse bit is set with probability µ m .Otherwise, the position is set to # with probability µ m1 .Each undefined position is set to the defined bit (0 or 1 randomly) with probability µ m2 .
Note that mutation may also result in some incomplete hyperplanes, so those are handled as the same way as crossover operation.
Details about VGA classifier are described by Bandyopadhyay and Pal (2001).

Evolving neural network using VGA
The training of evolving neural network using VGA algorithm is composed of three major phases: 1) connecting weight encoding; 2) computing fitness through the feed forward neural network for a set of training patterns; 3) selecting parents from the current generation according to their fitness, and applying crossover operator and mutation operator to parents to generate offspring which form the new generation.This process is iterated until some stopping criterion is met.

Connection weight encoding
Assuming that the input connection weight W and output connection weight V (chromosomes) are represented by two strings (StrV and StrW) which have l hidden nodes.
The length of StrV and StrW is (l+1)q and (m+1)l, respectively.We use a real number form to represent the connection weights.Because of m and q are constant for a special classification problem, so the lengths of StrV and StrW is also variable with l in evolving procedure.By convenient rule of thumbs (Duda et al., 2001), the number of hidden nodes is chosen such that the total number of weights in the network is roughly n/10, or more than this number, but it should not be more than the total number of training points, n.For the three-layer neural network, the total number of weights (excepting bias) is l(m+q), so the maximum number of hidden nodes l max ≤n/(m+q), the hidden node l can be chosen number from range [l max /10, l max ].In initialization, the connection weight values (including biases) are randomly chosen a real number from range [-1, +1].

Fitness computation
For a pair of strings (StrV i and StrW i ) with the same length l i , we can calculate the Mean Square Error (MSE) for a set of training patterns through Equations ( 1)-( 5).The fitness function is defined as: where f i is the fitness of the ith string, l i is its number of hidden nodes, α is a positive constant.Therefore, maximization of the fitness ensures the minimization of the EMS, and the term l i /l max will force the minimization of the number of hidden nodes.

Genetic operators
Selection, crossover and mutation are adopted as the same way as above section 2.2.2.Note that the connection weight encoding is a real number from the range [-1, +1].
In summary, the typical evolution flowchart of VGA-BP can be described as following: 1) Construct a set of neural networks with randomly generated hidden structure and initial connection weights, and train them using a number of training patterns.2) Calculate the fitness of each individual (chromosome) according to the average training result calculated from Equations ( 1)-( 7).3) Select parents from the current generation according to their fitness.4) Apply crossover and mutation operators to parents to generate offspring that form the new generation.5) Do the step 2 to 4 repeatedly until some stopping criterion is met.Now we get a near-optimal connection weights and the network topology.6) Go on to train with BP algorithm based on both hidden nodes and connection weights trained by VGA and to find the best connection weights.

Study Area and Materials
A salt-marsh with an area of approximately 8ha on Coomera Island (27°51′S, 153°33′E) in the southeastern Queensland, Australia was selected as study area.It is generally low-lying and hummocky with a relative relief of <1m.The Coomera Island is approximately 10km north of one of major tourist resorts in Australia, the Gold Coast, and 80km south of Brisbane, the state capital.The main tidal flooding source is a shallow inlet in the north (with another indirect source in the south).The inlet is vegetated with the Grey Mangrove (Avicennia marina).The salt marsh vegetation is mainly Marine Couch (Sporobolus virginicus and Sarcocornia quinqueflora ) in a fine mosaic.These will subsequently be referred to as Sporobolus and Sarcocornia.Sporobolus grows taller and more densely in less saline areas which are locally elevated, whereas Sarcocornia is mainly found in areas of higher salinity which are prone to waterlogging such as in local depressions and adjacent to drainage lines (Dale et al., 1986).
The CIR photography was provided by Dale et al. (1996).The original photographs had been enlarged to a scale of 1:1100 as positive paper prints that could later be used in the field.The samples for classification were obtained through a neutral filter, and primary color filters of blue, green and red since those separate respectively the green, red and infrared spectral bans on the CIR image.For each sample, site values for image reflectance through each filter were treated as the attributes.

Results
In the section, we will discuss some experimental results with VGA-BP classifier and compared with other classifiers.The effectiveness of algorithm has been demonstrated on CIR image.

Dataset
The dataset, which is used for training and testing patterns is randomly extracted from two kinds of data: the site spectral values which were classified using a divisive classification procedure and data from the field sampling (Dale et al., 1986).A total of 200 samples belonging to seven land cover classes are used for training and testing.The seven classes are: 1) tall and dense Sporobolus (TDSp.);2) mixed Sporobolus and Sarcocornia relatively tall and dense (MTDSp.Sa.); 3) mixed plant species of medium density (MMDSp.Sa.); 4) Sarcocornia (Sa.); 5) mangrove (Man.); 6) water body (WB) and 7) bare ground (BG).

Implementation parameters of classifiers
The control parameters for Bayes maximum-likelihood classifier, VGA classifier, BP neural network classifier and VGA-BP classifier are defined as follows: (1) Bayes maximum-likelihood classifier.Assuming the conditional density is the normal density, and the prior probability P(w i ) can get from training patterns.
(2) VGA classifier.Population sizes, h max , µ c , µ m1 and µ m2 are 20, 15, 0.8, 0.1 and 0.1, respectively.µ m is variable within the range [0.015, 0.333].The algorithm is terminated if the population contains at least one string with no misclassified points.Otherwise, the algorithm is executed for 6000 generations.
(3) BP-MLP classifier.A three-layer MLP with one hidden layer consisting of 8 hidden nodes is used.For the back propagation training algorithm, the learning rate is 0.15, the learning rate increment is set to 1.004, the momentum rate, which allows the MLP network learn more quickly when plateaus in the error surface exist, is 0.9, and the target training performance is 0.0185.The algorithm is end if performance is less than 0.0185.Otherwise, the algorithm is executed for 3000 epochs.
(4) VGA-BP classifier.With VGA algorithm, there are two population groups, and each population size is 20, the maximum number of hidden nodes is n/(m+q), crossover probability is 0.75, and mutation probabilities are the same as the VGA classifier.
With BP algorithm, all parameters are set to the same as BP-MLP classifier for comparison fairly.The VGA algorithm is executed for 500 generations, and BP algorithm is terminated if the target training performance is met.Otherwise it is executed for 3000 epochs.

Accuracy assessment of classifiers
User accuracy and Kappa coefficient are used to quantitatively assess the new classifier's capacity of classification (Pal et al., 2001).If i n ′ points (of all the n points) are found to be classified into class i, then the user's accuracy (U) is defined as: where c i n points have been correctly classified.User accuracy denotes the level of purity associated with a classified region.The Kappa coefficient measures the relationship of beyond chance agreement to expected disagreement.The estimate of Kappa (K) is the propor-tion of agreement after chance agreement is removed from consideration.The estimate of Kappa for class i (K i ) is define as: The numerator and denominator of the overall Kappa are obtained by summing the respective numerators and denominators of K i separately over all classes i.For comparison, Bayes maximum-likelihood classifier, VGA classifier, BP-MLP classifier and VGA-BP classifier are carried out on both the same training points (70% points of dataset) and the testing points (30% points of dataset).Bayes maximum-likelihood classifier is taken as a basis for its high performance in remote sensing imagery classification, as it is most widely and generally applied (Belluco et al., 2006).VGA classifier and BP neural network classifier are also chosen as we hope to know if VGA-BP classifier performs better.
Comparative classification results of VGA-BP classifier and other classifiers are shown in Table 1 and Table  2.As seen from Table 1 and Table 2, the performance of the VGA-BP classifier is better than that of the BP-MLP classifier, either training or testing (except Sa.).For the class Sa., it may be over-fitting for training data in VGA-BP, and will result in decrease of the generalization capability and testing performance.VGA-BP clas-sifier recognizes the different classes consistently with a high degree of accuracy.On the contrary, the other classifiers can recognize some classes much fine, however, they are much poorer for other classes.For example, the Bayes classifier provides User's accuracy of 100.00% and 87.50% for TDSp.and Man., respectively, but its User's accuracy for MMDSp.Sa. is only 28.57%.Here we use Cross-Validation to realize a low generalization error.

Pixel classification of CIR image
A 6609×6501 pixels CIR highly resolution image of Coomera Island in southern Moreton Bay, Queensland, Australia is used for classification (Fig. 2a).Different land cover classification results are shown in Fig. 2b-2e.
As seen from the Fig. 2, although all the classifiers are able to identify Mangroves, Water Body and Bare Ground, only VGA-BP classifier and BP-classifier are able to identify different plant species more clearly.As well the spatial pattern in Fig. 2e most closely relates to the actual distribution on the ground.Note that the circular area marked in blue is outside the salt marsh.It is upland and so is not part of the training pixels.Having relatively low reflectance in the near infrared, it has been classified as water, which is an unreasonable conclusion.Figure 3 is number of hidden nodes (L j ) corresponding to each generation during the evolutionary training.From this Fig. 3, we can see the variation of best topology with the number of generation of the VGA algorithm.The best number of hidden nodes ( 17) is obtained just after the 430 generations.
Figure 4 shows the best MSE corresponding to each generation during the evolutionary training of VGA. Figure 5  According to the results of the classification accuracy of classifiers, we could conclude that water body and mangrove are easily classified while Sporobolus and Sarcocornia are a little more difficult.However the hardest classes to be identified are the mixed two types.This may be because the differences between the mixed classes are not so obvious in the digital data.On the ground the Sarcocornia tends to have a sprawling life form, so the density estimates may be affected.Density was measured in terms of number of succulent stems per quadrat, yet if those are near horizontal the apparent density may be relatively large.So these two types suffer from poor classification accuracy from classified plot.Although the classification method is still having some works to improve the classified accuracy, it also provides us an alternative to extend our knowledge of nature and extent of the changes to a wider area than field experiment unobtrusively.

Conclusions and Discussion
In this paper, we have described an evolving neural network classifier using variable string genetic algorithm (VGA) in detail.We could find some important features of different classifiers we studied for image classification.The Bayes classifier is typical classifier, whose structure is determined by the conditional densities as well as by the prior probabilities.Although Gaussian density is popular of the various density functions, we have to get the prior-knowledge of density function for various complex classification problems.
The VGA classifier attempts to place hyperplanes (h) in the feature space appropriately such that number of misclassified training points is minimized.However, although the classifier could handle with hyperplanes in m-dimensional space, how to choose the maximum value of h is still a practical problem (because the search space regions are equal to 2 h ).The neural network could deal with complex nonlinear real-world problems so that the parameters governing the nonlinear mapping are learned at the same time as those governing the linear discriminant.However, BP algorithm often gets trapped in a local minimum of the error function and is incapable of finding a global minimum if the error function is multimodal.One way to overcome gradient-descentbased training algorithms' shortcomings is to adopt evolutionary neural network.With VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of initial connection weights globally.Then, using BP to perform local search from the initial connection weights and architecture, finally get the best connection weights for neural network.The effectiveness of the algorithm is demonstrated on CIR images.Compared with standard classifiers, such as Bayes classifier, VGA classifier and BP-MLP classifier, it has shown that the hybrid VGA algorithm based neural network classifier can have better performance on highly resolution land cover classi-fication.The VGA-BP classifier is not only used for CIR image but also for multi-spectral image classification.
The observed feature of the methodology is that the neural network structure is evolved automatically while connection weights are being evolved.Here some control parameters used in VGA algorithm and BP, such as crossover operator, mutation operator, learning rate, momentum rate etc., are experiential values that given by experts.However proper selection of control parameters for VGA algorithm and BP algorithm to different classification problems is still an open issue, which is a part of our further work.

Fig. 2
Fig. 2 Classification results of CIR image of Coomera Island by different classifiers is the MSE corresponding to each epoch during BP training.It is found that although VGA algorithm has greatly reduced the total MSE of the neural network, more improvement of training performance is achieved by applying a back propagation weight adjustment procedure.The time took during training and partitioning for CIR image is 24.98 and 9.88 seconds, respectively, which are calculated on a PC (Intel CPU 2.80GHz, 1G RAM).

Fig. 3
Fig. 3 Best neural network hidden nodes (L j ) by VGA evolutionary training

Table 1
Results of classification accuracy for training (70% of dataset)