Evaluation of the rejection processes

The following functions output all the necessary datapoints to plot accuracy-rejection curves for evaluation of (partial) rejection. For rejection with hierarchical classification, a parallelised function is available, as prediction over a lot of thresholds can sometimes take quite a lot of time.

Evaluation.Functions_Accuracy_Reject.Evaluate_AR_Flat(clf_list, Xtests, ytests, predictions, probabilities, b, scores)

Function to generate datapoints for accuracy-rejection curves with flat classification: The rejection threshold is varied with a stepsize of 0.01.

Parameters:

clf_list (list of scikit-learn classifiers) – Contains the trained classifiers on the test sets over the K folds.
Xtests (list of matrices) – Contains the test data per fold
ytests (list of lists) – Contains the actual labels of the test data per fold
predictions (list of lists) – Contains the predictions per fold
probabilities (list of matrices) – Contains the probabilities of the predictions per fold over all the classes
b (boolean) – Is the hierarchy balanced?
scores (boolean) – Does the trained scikit-learn classifier output scores or probabilities (with predict_proba)?

Returns:

results – The following metrics are saved in a dictionary with key ‘Try fold_number’ for every fold: the accuracies (acc) and rejection percentage (perc:) for every rejection threshold, the rejection thresholds themselves (steps), the actual values (ytest), the predictions (preds), the probabilities (probs) and the lengths of the predictions (lp) and actual values (lt) per rejection threshold

Return type:

nested dictionary

Evaluation.Functions_Accuracy_Reject.Evaluate_AR(clf_list, Xtests, ytests, predictions, greedy=True)

Function to generate datapoints for accuracy-rejection curves with hierarchical classification: The rejection threshold is varied with a stepsize of 0.01.

Parameters:

clf_list (list of scikit-learn classifiers) – Contains the trained classifiers on the test sets over the K folds.
Xtests (list of matrices) – Contains the test data per fold
ytests (list of lists) – Contains the actual labels of the test data per fold
predictions (list of lists) – Contains the predictions per fold
greedy (boolean, optional) – Perform greedy (True) or non-greedy (False) hierarchical classification, by default True

Returns:

results – The following metrics are saved in a dictionary with key ‘Try fold_number’ for every fold: the accuracies (acc) and rejection percentage (perc:) for every rejection threshold, the rejection thresholds themselves (steps), the actual values (ytest), the predictions (preds), the probabilities (probs) and the lengths of the predictions (lp) and actual values (lt) per rejection threshold

Return type:

nested dictionary

Evaluation.Functions_Accuracy_Reject.Evaluate_AR_parallel(clf_list, Xtests, ytests, predictions, all_jobs, greedy)

Function to generate datapoints for accuracy-rejection curves with hierarchical classification in a parallel manner: The rejection threshold is varied with a stepsize of 0.01.

Parameters:

clf_list (list of scikit-learn classifiers) – Contains the trained classifiers on the test sets over the K folds.
Xtests (list of matrices) – Contains the test data per fold
ytests (list of lists) – Contains the actual labels of the test data per fold
predictions (list of lists) – Contains the predictions per fold
all_jobs (int) – number of CPU cores to parallelize the calculations over
greedy (boolean, optional) – Perform greedy (True) or non-greedy (False) hierarchical classification, by default True

Returns:

results – The following metrics are saved in a dictionary with key ‘Try fold_number’ for every fold: the accuracies (acc) and rejection percentage (perc:) for every rejection threshold, the rejection thresholds themselves (steps), the actual values (ytest), the predictions (preds), the probabilities (probs) and the lengths of the predictions (lp) and actual values (lt) per rejection threshold

Return type:

nested dictionary