Home About Us Contact Blog
What's New Products Buy Now Downloads Forum DNA Microarrays


Gene Expression Programming


DNA Microarrays: A Case Study

The great challenge in DNA microarrays problems consists in finding the genes that are relevant to a particular disease or cell state in the midst of thousands of other genes. And thinking of genes as variables, for tackling these problems successfully, in an ideal world, one would have required about an order of magnitude more of samples. However, in a typical DNA microarray study, the number of samples available (a few dozens or a few hundreds at best), when compared to the thousands of genes under study, is ridiculously small, which obviously poses some challenges.

First, it is common to try and narrow down the search space with sophisticated discretization algorithms to filter out the noise or irrelevant genes. And although good results have been reported (data size reductions of 50%-98%), the problem still remains: hundreds or thousands of variables to be mined using just a few dozen samples.

The great advantage of using GeneXproTools in DNA microarray studies is that you can use the raw data (obviously you can also use the filtered data and even use GeneXproTools to filter out the noise) and still obtain excellent results. Obviously this means that not all the models with a good accuracy on the training data will have good predictive accuracy, but by making several runs one can select the top 10-20 models and then cross-reference the genes (attributes) used in all of them. You can then select and copy these most important genes from the Data Panel and create a much smaller dataset for creating the final model.

Let's illustrate this with real-world DNA microarray data, using the well-studied ALL-AML Leukemia datasets (these same datasets are used in the DNA Microarray sample run of GeneXproTools 4.0, and we recommend you play with it as you'll be able to see everything including the generated code). In this problem, the training dataset consists of 38 bone marrow samples (27 ALL and 11 AML), over 7129 probes from 6817 human genes. And the testing dataset consists of 34 samples, with 20 ALL and 14 AML. For this analysis, "0" was used to represent "ALL" and "1" to represent "AML" and the 7129 genes were numbered d0-d7128.

For instance, in one study, for the ALL-AML Leukemia problem, a total of 308 promising genes were identified in 25 good runs (that is, in this case runs with 100% training accuracy and testing accuracies between 91.18%-97.06%). Of these 308 promising genes, only 11 (genes 759, 1881, 2287, 2407, 4362, 4846, 5485, 6040, 6587, 6638, and 6854) appeared in two or more models; and of these, only five (genes 759, 1881, 2287, 4846, and 6854) appeared in more than three models, with the most prevalent being genes 1881, 2287, and 4846, with eight, six, and four appearances, respectively. So, it is a good guess that the genes mostly to be involved in ALL-AML leukemia are genes 1881, 2287, and 4846, which is an exceptionally good starting point for tackling leukemia.


Golub, T. R., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. BloomÞeld, and E. S. Lander, 1999. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286:531-537.

Last modified: September 30, 2006

Cite this as:

Ferreira, C. "DNA Microarrays: A Case Study." From GeneXproTools Tutorials – A Gepsoft Web Resource. https://www.gepsoft.com/tutorial001.htm


 Time Limited Trial

 Try GeneXproTools for free for 30 days!


   Academic Editions

Academic licenses of all GeneXproTools editions are available at a discount for education institutions & students.

   Software Bundles

Bundles of GeneXproTools & GeneXproServer are available at a discount price for all editions of GeneXproTools, including academic editions.

   Subscribe to the GEP-List

3 8 4

"Finally, a world class user interface in the field of genetic programming and evolutionary computation !! GeneXproTools is simply unrivaled in its marvelous user interface, the breadth of its Fitness Functions, the choice and flexibility in Math and Logic functions, the clarity of its final Model Presentation, and a built in panel for Scoring new data, right inside the interface. This kind of functionality and ease of use has never been seen before in the field of Genetic Programming. Additionally, Dr Ferreira's specific methodology of Gene Expression Programming makes important contributions to the field of evolutionary computation, and the various algorithms she has developed and deployed inside of GeneXproTools are brilliantly conceived, and her methodologies evolve highly predictive models that solve real business problems. GeneXproTools is an extraordinary structural tour de force."

Brian C. Watt, CRM
Chief Risk Officer / Chief Financial Officer


"I have been using GeneXproTools against a variety of drug research related problems and have found the GUI to be readily-usable and well-attuned to the stages of predictive modeling..."

Steven J Barrett, Ph.D
Principal Scientist
New Applications Team
Data Exploration Sciences GlaxoSmithKline


"Gene Expression Programming, combined with GeneXproTools, allow us here at Mercator GeoSystems to explore new and exciting methods for spatially modelling the relationship between a company's outlets and their customers. The GeneXproTools software is simple to use, well-designed and very flexible. In particular the ability to load training data from a database, and the option to create models in the programming language of our choice, really make this product stand out. Product support is excellent and very responsive - heartily recommended!"

Steve Hall
Mercator GeoSystems Ltd
United Kingdom

"As a professional software developer, I could have attempted to read up on all the latest developments in the field of evolutionary programming and start writing my own modeling tools. One look at the GeneXproTools demo, however, was enough to convince me of the absurdity of that thought. Not only does GeneXproTools have all the power that I would ever need, but it also allows me to customize all parts of the modeling process. I don't have to know the first thing about evolutionary algorithms and yet I can write my own grammars or fitness functions if I wanted to. It is obvious that a huge amount of work went into the making of GeneXproTools, and I am now a very happy customer. Keep up the great work, Gepsoft!"

Glenn Lewis
Software developer, USA

"I've been working as a coastal engineer and mathematical modeler for more than 10 years and now I'm using GeneXproTools to discover complex nonlinear relations that exist in hydraulic and wave processes. For example, GeneXproTools recently helped me establish several explicit approximations to the Wave Dispersion Equation and now with the new version, which allows more independent variables, fitness functions and unlimited records, I plan to develop my own formulae to evaluate the wave overtopping of breakwaters and seawalls. Thanks Gepsoft for providing such an exciting, creative and useful software tool to the scientific community."

Ricardo Carvalho
PROMAN - Centro de Estudos & Projectos, SA
Lisbon, Portugal

"GeneXproTools is being used to look at problems involving parasite populations, where the data is highly skewed. The results using GeneXproTools are considerably better than those obtained using conventional statistics."

Prof John Barrett
Head of the Parasitology Group
University of Wales, UK

"We are using GeneXproTools for modeling the rainfall-runoff process and time series forecasting. GeneXproTools has a nice graphical user interface system and a lot of flexibility in choosing the type of input file. Configuring the problem setup, running and visualizing the graphical outputs with GeneXproTools is indeed user friendly. Being able to get the final model in the languages of our choice makes GeneXproTools stand out from other packages."

Professor S. Mohan
Professor & Head of the Department
Department of Civil Engineering
Indian Institute of Technology, Madras


Home | What's New | Products | Buy Now | Downloads | Quick Tour | Support | Contact Us | About Gepsoft | Sign Up
Forum | Blog | Videos | Tutorials | Knowledge Base | Server KB | Logistic Regression Guide | Terms of Use | Privacy & Cookies

Copyright (c) 2000-2023 Gepsoft Ltd. All rights reserved.