Correlation Coefficient

Choosing the Fitness Function

GeneXproTools 4.0 implements the Correlation Coefficient fitness function both with and without parsimony pressure. The version with parsimony pressure puts a little pressure on the size of the evolving solutions, allowing the discovery of more compact models.

The Correlation Coefficient fitness function of GeneXproTools 4.0 is, as expected, based on the standard correlation coefficient, which is a dimensionless index that ranges from -1 to 1 and reflects the extent of a linear relationship between the predicted values and the target values.

The correlation coefficient C_i of an individual program i is evaluated by the equation:

where Cov(T,P) is the covariance of the target and model outputs; and s_t and s_p are the corresponding standard deviations, which are given by:

where P_(ij) is the value predicted by the individual program i for sample case j (out of n fitness cases or sample cases); T_j is the target value for fitness case j; andandare given by the formulas:

The correlation coefficient is confined to the range [-1, 1]. When C_i = 1, there is a perfect positive linear correlation between T and P, that is, they vary by the same amount. When C_i = -1, there is a perfect negative linear correlation between T and P, that is, they vary in opposite ways (when T increases, P decreases by the same amount). When C_i = 0, there is no correlation between T and P. Intermediate values describe partial correlations and the closer to 1 or -1 the better the model.

The fitness f_i of an individual program i is expressed by the equation:

f_i = 1000*C_i*C_i

and therefore ranges from 0 to 1000, with 1000 corresponding to the ideal.

Its counterpart with parsimony pressure, uses this fitness measure f_i as raw fitness rf_i and complements it with a parsimony term.

Thus, in this case, raw maximum fitness rf_max = 1000. And the overall fitness fpp_i (that is, fitness with parsimony pressure) is evaluated by the formula:

where S_i is the size of the program, S_max and S_min represent, respectively, maximum and minimum program sizes and are evaluated by the formulas:

S_max = G (h + t)

S_min = G

where G is the number of genes, and h and t are the head and tail sizes (note that, for simplicity, the linking function was not taken into account). Thus, when rf_i = rf_max and S_i = S_min (highly improbable, though, as this can only happen for very simple functions as this means that all the sub-ETs are composed of just one node), fpp_i = fpp_max, with fpp_max evaluated by the formula:

Home | Contents | Previous | Next