Relative Absolute Error

Choosing the Fitness Function

GeneXproTools 4.0 implements the Relative Absolute Error (RAE) fitness function both with and without parsimony pressure. The version with parsimony pressure puts a little pressure on the size of the evolving solutions, allowing the discovery of more compact models.

The RAE fitness function of GeneXproTools is, as expected, based on the standard relative absolute error, which, on its turn, is based on the absolute error.

The relative absolute error is very similar to the relative squared error in the sense that it is also relative to a simple predictor, which is just the average of the actual values. In this case, though, the error is just the total absolute error instead of the total squared error. Thus, the relative absolute error takes the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor.

Mathematically, the relative absolute error E_i of an individual program i is evaluated by the equation:

where P_(ij) is the value predicted by the individual program i for fitness case j (out of n fitness cases); T_j is the target value for fitness case j; andis given by the formula:

For a perfect fit, the numerator is equal to 0 and E_i = 0. So, the RAE index ranges from 0 to infinity, with 0 corresponding to the ideal.

As it stands, E_i can not be used directly as fitness since, for fitness proportionate selection, the value of fitness must increase with efficiency.

Thus, for evaluating the fitness f_i of an individual program i, the following equation is used:

which obviously ranges from 0 to 1000, with 1000 corresponding to the ideal.

Its counterpart with parsimony pressure, uses this fitness measure f_i as raw fitness rf_i and complements it with a parsimony term.

Thus, in this case, raw maximum fitness rf_max = 1000. And the overall fitness fpp_i (that is, fitness with parsimony pressure) is evaluated by the formula:

where S_i is the size of the program, S_max and S_min represent, respectively, maximum and minimum program sizes and are evaluated by the formulas:

S_max = G (h + t)

S_min = G

where G is the number of genes, and h and t are the head and tail sizes (note that, for simplicity, the linking function was not taken into account). Thus, when rf_i = rf_max and S_i = S_min (highly improbable, though, as this can only happen for very simple functions as this means that all the sub-ETs are composed of just one node), fpp_i = fpp_max, with fpp_max evaluated by the formula:

Home | Contents | Previous | Next