Using the GEP model in R
  • I am having a problem with using a GEP model in R. The R code for SubET2
    returns incorrect values - they should be 0s but instead are ~7s.
    I've tested the model by transcribing the code into excel and the value
    for sub-ET2 is correct i.e. 0. I've tried inputting the script into R a few different ways in R, but to
    no avail. Any ideas what's going wrong?

  • Hi,

    Thank you for posting this here and sending the files.

    I took a look at both the gep and xls files you sent and there's an error in your transcription of the Sub-ET2 code into Excel. The original code in C++ is:

    y = log(gepMax2(tanh(d[9]),(G2C1-((d[27]+(((d[24]+d[21])/2.0)*d[3]))/2.0))));

    And your Excel code is:

    =LN((MAX((TANH(D7)),(-6.04296126895962-(AVERAGE(H7,((AVERAGE(G7,F7)))))))))

    But it should be (you missed the C7 when you introduced the AVERAGE function):

    =LN(MAX(TANH(D7),(-6.04296126895962-((H7+(((G7+F7)/2)*C7))/2))))

    This code returns 7.376038623 both in Excel and in GeneXproTools and also in Excel VBA when you do Ensemble Deployment to Excel. So I'm guessing the R code is also correct.

  • Thanks for reworking my calculations - a silly error of mine that I should have picked up. My apologies.

    Re the SubET2 value of 7.37. This is the value I get when I run the R script. In most cases except Record ID #5  believe it's wrong though. The values vary between samples e.g. row 4 = 7.37 where as row 1 = 0. When SubET2 (as above) is used to calculate the modeled value in excel the returned values are correct. In R all values for SubET2 = 7.37 and the modeled value does not equal that shown in the GeneXproTools results panel. Here is a worked example of what I mean.

    Record ID #1:

    R script (copied from GeneXproTools)
    SubET1 = 8.08
    SubET2 = 7.37
    SubET3 = 38.69
    SubET 1 + 2 + 3 + 4 = 54.15

    Excel (transcribed from expression tree / R script
    SubET1 = 8.08
    SubET2 = 0
    SubET3 = 38.69
    SubET 1 + 2 + 3 + 4 = 46.78

    GeneXproTools results panel
    SubET 1 + 2 + 3 + 4 = 46.78




  • Hi,

    We tested both the Sub-ET2 and the complete model using the latest version of R (3.0.3) and we cannot find any differences between the GeneXproTools output and R. I am going to send you by email the script of the full model shown below plus the data we are using (from your gep file) that we are using for you to try and see if you still get different values:


    gepModel <- function(d)
    {
    G1C7 <- 1.40720706472976
    G2C1 <- -6.04296126895962
    G3C4 <- 3.54960783715323

    y <- 0.0

    y <- gep3Rt(((d[3]+(atan(exp(G1C7))-log(d[17])))/2.0))
    y <- y + log(max(tanh(d[10]),(G2C1-((d[28]+(((d[25]+d[22])/2.0)*d[4]))/2.0))))
    y <- y + (d[2]*(atan(d[3])/G3C4))

    return (y)
    }

    gep3Rt <- function(x)
    {
    return (if (x < 0.0) (-((-x) ^ (1.0/3.0))) else (x ^ (1.0/3.0)))
    }


    #######################################################################################

    NumericTests <- function(name, dataPath, modelFunction)
    {
    print (paste("Starting ", name , "... "))

    data <- read.table(dataPath, header=TRUE)

    counter = 1
    hasErrors = FALSE
    for (index in 1:nrow(data))
    {
    row = data[index, ];
    v <- as.vector(as.matrix(row))

    result <- modelFunction(v)
    expected <- tail(v, 1)

    if (abs(result - expected) > 0.0000001)
    {
    print(paste("WRONG record " , counter, " = ", result))
    hasErrors = TRUE
    }
    counter = counter +1
    }

    if (counter <= 1)
    {
    print("NOTHING PROCESSED")
    }

    if (hasErrors)
    {
    print(paste(name , " finished with ERRORS."))
    }
    else
    {
    print(paste(name , " finished without errors."))
    }

    }

    NumericTests("gepModel", "C:\\SupportTests.txt", gepModel)




  • When I run the script, using the .txt file of input data I get:

    "WRONG record 1 = 32.3975052714607"
    This is true for all 43 cases.

    This is is happening because the output is for the complete model not just SubET2. The correct value should be 0 for SubET2. However, it does show that your R script works and mine doesn't!

    This is my script:

    gepModel <- function(d)
    {
        G1C7 <- 1.40720706472976
        G2C1 <- -6.04296126895962
        G3C4 <- 3.54960783715323

        y <- 0.0

        y <- gep3Rt(((d[3]+(atan(exp(G1C7))-log(d[17])))/2.0))
        y <- y + log(max(tanh(d[10]),(G2C1-((d[28]+(((d[25]+d[22])/2.0)*d[4]))/2.0))))
        y <- y + (d[2]*(atan(d[3])/G3C4))

        return (y)
    }

    gep3Rt <- function(x)
    {
        return (if (x < 0.0) (-((-x) ^ (1.0/3.0))) else (x ^ (1.0/3.0)))
    }

    mod.2959<-gepModel(gep.for.bap) # gep.for.bap = my data frame read as a .csv file.

    I'll try reading in directly from a .txt file like you have done.

    It seems like the issue is my R scripting, so thank you for giving this post such attention. I will post an update soon.

    BW, Darren
  • Hi Darren,

    That's correct: the txt file I sent was for Sub-ET2, not the complete model (sorry about that; I've just sent you the correct one). But yes: you should now be able to run your R scripts.
  • I can now indeed run the scripts. Below is a modified version of the GeneXproTools output and associated implementation that permits SubET and final model outputs to be returned that I hope other readers find useful.

    gepModel <- function(d)
    {
      G3C1 <- -8.06338694418165
      G3C2 <- -5.46599658192694
     
      y  <- 0.0
      y1 <- 0.0
      y2 <- 0.0
      y3 <- 0.0
      y4 <- 0.0
     
      y1 <- ((d[2]+gep3Rt(d[5]))/2.0)
      y2 <- (gep3Rt(d[22])*gep3Rt((d[9]-((((d[7]+d[2])/2.0)-d[13])+d[13]))))
      y3 <- ((gep3Rt((G3C1+d[22]))+atan(((d[13]+G3C2)/2.0)))+gep3Rt((d[10]-d[13])))
      y4 <- (gep3Rt(d[22])-gep3Rt((d[4]+(((1.0-(d[5]-d[10]))+(d[26]-d[12]))/2.0))))
     
      y <- y1+y2+y3+y4
      z <- as.vector(as.matrix(c(y1,y2,y3,y4,y)))
      z<-t(z)
     
      return (z)
    }

    gep3Rt <- function(x)
    {
      return (if (x < 0.0) (-((-x) ^ (1.0/3.0))) else (x ^ (1.0/3.0)))
    }
    #######################################################################################

    ####
    fname<-file.choose() #select data
    ####

    NumericTests <- function(name, dataPath, modelFunction)

      data <- read.csv(file=fname,as.is=T)
      out<-data.frame(matrix(vector(), nrow(data), 5))
     
      for (i in 1:nrow(data))
      {
        row = data[i, ];    
        v <- as.vector(as.matrix(row))
       
        result <- modelFunction(v)
       
        out[i,]<-result
       
      }
      colnames(out)<-c("sub-ET1","sub-ET2","sub-ET3","sub-ET4","All sub-ETs")
      return (out)
    }

    ###
    df_pt1_2941<-NumericTests("gepModel", fname, gepModel) #modelled data
    ###

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Tagged

  • r 2