@@ -99,10 +99,73 @@ The **run** method returns a solution object, consisting of p weights and w weig
The algorithm is built to be used with different methods to evaluate the fitness score of each chromosome. Two different criteria are already implemented : *distance* and *AUC*.
-**Distance**: for each element in the population, the WOWA function us computed on all examples of the dataset. hen, the difference between the WOWA result just computed and the result given by the training dataset. All these differences are added to obtain the distance that is the fitness score of a chromosome.The smallest is the distance, the best is the chromosome.
-**AUC*: the Area Under the Curve (AUC) fitness score is designed for binary classification. The obtain the AUC, the Receiver Operating Characteristics (ROC) is built first. Concretely, the WOWA function is computed on all elements of the training dataset. Then, on these results, the ROC curve is built. The AUC of this ROC curve is the fitness score of an element. The biggest is the AUC, the best is the chromosome.
-**AUC**: the Area Under the Curve (AUC) fitness score is designed for binary classification. The obtain the AUC, the Receiver Operating Characteristics (ROC) is built first. Concretely, the WOWA function is computed on all elements of the training dataset. Then, on these results, the ROC curve is built. The AUC of this ROC curve is the fitness score of an element. The biggest is the AUC, the best is the chromosome.
It is possible to create new Solution type with new evaluation criterion. The new Solution type must inherit of *AbstractSolution* class and override the method *computeScoreTo*. It is also necessary to modify the method *createSolutionObject* method in the *Factory* class.
The method runKFold runs a k folds cross-validation. Concretely, it separates the dataset in k folds. For each folds, a single fold is retained as the validation data for testing the model, and the remaining k − 1 folds are used as training data. The cross-validation process is then repeated k times, with each of the k folds used exactly once as the validation data. The k results can then be averaged to produce a single estimation.
For each tested fold, the Area Under the Curve is also computed to evaluate the classification efficiency (works only expected vector that contains 0 and 1).
As output, the method **runKFold** return a HashMap that contains the best solution for each fold and the AUC corresponding to this solution.
The method **runKFold** takes as argument the dataset (data and expected result) the number of folds used in the cross validation and a value that can increase the number of alert is this number is to low.
This method is interesting to increase the penalty to do not detect an alert.