Data Normalization
  • Dear Gepsoft Team,


    I have the following question:

    I have encountered the problem that during developing a trading rule, the developed alogrithm was useable

    only for the instrument, on which the rule was developed. I believe that this is due to the fact, that different

    tradeables have different prices. For example the EUR/USD currency pair has had a trading range in the last

    14 years from approx. 0,8 - 1,5, whereas the EUR/JPY currency pair was traded between approx. 90-140.

    It is clear that the same algorithm may yield to very different results if applied for numbers around 1 or on numbers

    around 100.

    Now: Do I understand correctly, that data normalization could solve this issue? If yes, which data normalization should

    be used, if one cannot be 100% sure that the future trading range will not stay between the past trading range?

    For example, no one really thought that Gold will trade a way above 1000 USD/ounce back in the ninties.

    Can we be sure that no information will be lost during the normalization?

    Many thanks for the answer!

    Best regards,


  • Hi Balazs,

    Yes, you’re right: data normalization should help with this problem. You can use either of the normalization methods implemented in GeneXproTools 5.0, but I would suggest you start with Standardization.

    The other alternative is to use Percentage Change instead of the raw values, which some traders seem to prefer.

    It would be great to hear what other traders think of this and also which of these methods worked best for you!  


  • My experience is to never use the O, H , L, C raw values for this very issue. Market instruments hit new highs and new lows never seen in history all the time and those new unforeseen values seems to make the models go wacky (A technical term). Deltas, spreads, ratios etc seem to work better in my experience..

    One interesting thing I've discovered recently is how important the day of the week can be  in modeling financial data- and I'm now preprocessing

    most of my data input with 5 (M,T,W,TH,F) columns with binary (1 - 0) values.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!