Good (?) logistic regression models with 100% accuracy
  • I've started to revisit my financial market time series modeling using GXPT Logistic Regression. My prediction / response var is based on the percent change of the previous day close. If tomorrows market is up (1) down (zero) This data is recent data for the EURUSD spot daily.
    The input vars I'm using are essentially standard financial technical analysis indicators, but instead of using the raw signals, I'm using the percent changes from the previous day. There are a couple of signal processing type indicators Wavelets and FFT that I'm also including. 
    I'm used to scratching my head, not because I can't get a decent model, but just the opposite. My model has ZERO errors in the validation process. I've got to be doing something wrong. I've never built a model with any tool that was "perfect" involving market data.

    I've attached the model I'm referring to. I've tried using just about every fitness function avail and I get the same zero error results. It's like I can't break it even if I wanted to.  Can you take a peek and see if I'm making some humongous error in my logic? No software can be that brilliant, not even yours (maybe).. 
  • We will be answering more in depth, but usually what happens is that the answer/target is somehow hiding in the variables you are using. So please have a hard look into your variables to rule out the hiding target problem (we will also have a look at the run you sent very soon).
  • I spent all night last night trying to find the error in my logic of the model I sent you. 
    It appears so far I'm doing everything right. I am not finding a signal instance of what traders refer to as LFB look-forward bias". 

    I confirmed from multiple data sources that my raw RSI (Relative Strength Index) indicator values were accurate; they all checked out. RSI is a standard indicator that most traders use and is found in every chart app avail for trading. If there was LFB, it would have been detected decades ago when the formula first hit the scene. My only processing of that was a very simple formula that I use all the time and have tripled checked - percent change from todays value compared to yesterdays (New value/Old value -1).

    So my next idea is to build a model the same way I did, but remove that last 10 Percent of the rows and then use that 10 percent subset of data in the scoring function. Because I know what the actual output / prediction values should be (because the data is from the past), if the model is really "error free", I should be able to compare my scoring results with the actual result predictions for those rows and compare them.
    That should give me a little insight. 
  • There are a few simple tests that you can do that are useful in similar situations. First make really sure that you haven’t somehow blended the dominant predictor into your response variable. It’s more common the other way around, where people create a predictor variable that, by mistake, uses the response variable and then, voilà, GeneXproTools finds a perfect solution (happens to the best of us, I’m afraid :). But since in this case the culprit (PC_RSI) is something external, please make sure that your response is not using it in some indirect way.

    Just by the analysis of the correlations of each predictor variable against the response (which you can do in the Data Panel), you can see that there’s nothing overtly suspicious there, although it has by far the highest R-square (0.1785; the second best is around 0.009).

    Another neat and simple thing to do is to use the new feature Add Simple Models, and add your predictor variables to the History. When you do that, GeneXproTools evaluates the rounding thresholds automatically for these simple models. In this case the PC_RSI simple model has already 96.55% and 98.74% accuracy in the training and validation, respectively.

    And another thing you can do is to create a strictly linear logistic regression model with just this variable. You can do that in GeneXproTools easily by creating a seed model with a simple structure with a head size of 3 and as many genes as you have variables. Since in this case you’re only concerned with one variable, you just need 1 gene and a head such as *.c0.d0 where d0 is your variable. Then, in the Genetic Operators Tab, choose the strategy Constant Fine-Tuning, and then do Continue in the Run Panel. If you do that, you’ll see that this linear LR model has the same performance as the simple model mentioned above.

    And yet another thing you can do is create a more sophisticated model using just the simple functions + - * and a simple program architecture and then simplifying it in the end. In this case it was also possible to create very good models with this simple architecture. For example, I created a model with 99.37% accuracy in the training and 100% in the validation.

    I hope this helps in pointing you in the right direction.


  • I found the error in my ways just 10 min ago...The value I was predicting (1,0 BUY SELL) need to be shifted by one day which I wasn't doing hence the look forward bias, I feel embarrassed I didn't recognize it earlier. Sorry I pestered you with a dumb issue.
  • Just in case anyone else has a similar issue that I had a few days ago, when modeling market data,  PREDICTION '/ Label /Target variable needs to be on the NEXT row, Shifted forward by at least one, otherwise, you are merely "forecasting" the current days value at the end of the day, which the model already knows. Seems simple enough but I spaced it.
    Most of the software I have that does this kind of model, automatically performs this shift for me, it just took me a couple days to realize it because I was focusing my questions how valid the indicator values were. I knew something was wrong when I kept seeing perfect scores for every market type I tested.

    Only sharing this with you because I'm sure you'll have other market forecasting users in the future do similar types of things.
    As a result of yesterdays discovery, I've since canceled the purchase of remote private island that I was planning on retiring all of us on. 
    My condolences.

  • We are very sorry, about the island, of course :-).

    I am glad you managed to root out the issue, though.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!