GeneXproTools allows you to
delete records both from the training and validation/test datasets.
Record deletion is important because any dataset can have all sorts of
not only from error introduced during data collection but also due to intrinsic noise in
the data such as noise from measuring instruments.
GeneXproTools can help you detect these outliers using different analyses and visualization tools.
For example, you can easily detect outliers in all variables with the help of adjustable
standard deviation lines (1, 2 and 3 sigmas) in the
Sequential Distribution Chart. Moreover, GeneXproTools also allows you to copy the
indexes of all the outliers for the current variable by choosing Copy Outlier IDs (3 Sigma)
in the context menu. These outlier indexes can then be pasted directly into the Delete Records Window
for the easy removal of all outliers.
Scatter plots are also useful for detecting outliers and GeneXproTools shows scatter plots
for all pairs of variables, including model outputs and derived variables.
The Highlight Records functionality of GeneXproTools is particularly useful in classification
and logistic regression problems, where you can use it to detect both
labeling errors and
outliers by combining the Highlight Records functionality with different charts, for example,
the normalized Bivariate Line Chart with different sorting options.
record analyses and statistics are also available in the Data Panel
that can help identify outliers or errors. For example,
error analysis is an
extremely powerful tool for understanding your data and your models a little better.
It’s also useful to detect errors in the data, for example by comparing misclassified records
with different prototypes, such as the class centroids
in classification and
logistic regression problems.