Nested Cross Validation Report
It contains in a single dataclass all relevant information concerning the training loop.
In addition, it has several handy methods for comparing the performance on all folds,
comparing the best params found on all folds, and iterating over all models (with their corresponding
outer fold) for further custom checks.
It contains one element (row when converted to a dataframe) for each trained model during the inner loop
including the refitted model over the whole inner dataset. That is, if the number of outer folds is k_o and the
number of inner folds is k_i, and s_o and s_i folds are skipped respectively, then the number of rows will be:
(k_o - s_o) * (k_i - s_i + 1), where the + 1 is added for the refitting of the best model on the whole
(X_outer_train, y_outer_train).
The attributes section contains all available information for each model (row).
Attributes:
Name |
Type |
Description |
best |
list(bool)
|
list of boolean values where True corresponds to the models that
have been refitted and trained over the whole (X_outer_train, y_outer_train). |
outer_kfold |
list(int)
|
list of integers that hold the correspondence between each
model and the corresponding outer fold. |
params |
list(dict)
|
list of dictionaries holding the parameters of each model |
inner_validation_metrics |
dict(list)
|
dictionary containing a list for each metric
as averaged over all inner validation folds. |
outer_test_metrics |
dict(list)
|
dictionary containing a list for each metric
as computed over the outer folds (only available for the best model of each outer
fold). This is the most important piece for checking performance evaluation. |
model |
list(estimator)
|
list of models (only available for the best model of each outer
fold). |
outer_test_indexes |
list(ndarray)
|
list of indexes for corresponding outer fold for each
model. This attribute, with the previous one, are output to make it easier to perform
further checks. |
Source code in nestedcvtraining/utils/reporting.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158 | @dataclass
class Report:
""" Nested Cross Validation Report
It contains in a single dataclass all relevant information concerning the training loop.
In addition, it has several handy methods for comparing the performance on all folds,
comparing the best params found on all folds, and iterating over all models (with their corresponding
outer fold) for further custom checks.
It contains one element (row when converted to a dataframe) for each trained model during the inner loop
including the refitted model over the whole inner dataset. That is, if the number of outer folds is k_o and the
number of inner folds is k_i, and s_o and s_i folds are skipped respectively, then the number of rows will be:
(k_o - s_o) * (k_i - s_i + 1), where the + 1 is added for the refitting of the best model on the whole
(X_outer_train, y_outer_train).
The attributes section contains all available information for each model (row).
Attributes:
best (list(bool) ): list of boolean values where True corresponds to the models that
have been refitted and trained over the whole (X_outer_train, y_outer_train).
outer_kfold (list(int) ): list of integers that hold the correspondence between each
model and the corresponding outer fold.
params (list(dict) ): list of dictionaries holding the parameters of each model
inner_validation_metrics (dict(list) ): dictionary containing a list for each metric
as averaged over all inner validation folds.
outer_test_metrics (dict(list) ): dictionary containing a list for each metric
as computed over the outer folds (only available for the best model of each outer
fold). This is the most important piece for checking performance evaluation.
model (list(estimator) ): list of models (only available for the best model of each outer
fold).
outer_test_indexes (list(ndarray) ): list of indexes for corresponding outer fold for each
model. This attribute, with the previous one, are output to make it easier to perform
further checks.
"""
best: list = field(default_factory=list)
outer_kfold: list = field(default_factory=list)
params: list = field(default_factory=list)
inner_validation_metrics: dict = field(default_factory=lambda: defaultdict(list))
outer_test_metrics: dict = field(default_factory=lambda: defaultdict(list))
model: list = field(default_factory=list)
outer_test_indexes: list = field(default_factory=list)
def to_dataframe(self):
""" It converts the information to a dataframe.
Returns:
df (dataframe): A dataframe with one row per trained model, with all attached information.
"""
self_dict = copy.deepcopy(self.__dict__)
inner_validation_metrics = self_dict.pop("inner_validation_metrics")
outer_test_metrics = self_dict.pop("outer_test_metrics")
all_params = self_dict.pop("params")
all_params_keys = sorted(set(chain.from_iterable([[ik for ik in ks] for ks in all_params])))
for param_key in all_params_keys:
self_dict["param__" + param_key] = []
for param in all_params:
self_dict["param__" + param_key].append(param.get(param_key, None))
for k in inner_validation_metrics:
self_dict["inner_validation_metrics__" + k] = inner_validation_metrics[k]
for k in outer_test_metrics:
self_dict["outer_test_metrics__" + k] = outer_test_metrics[k]
return pd.DataFrame.from_dict(self_dict)
def _append(
self, best, outer_kfold, params, inner_validation_metrics, outer_test_metrics, model, outer_test_indexes
):
self.best.append(best)
self.outer_kfold.append(outer_kfold)
self.params.append(params)
self.model.append(model)
self.outer_test_indexes.append(outer_test_indexes)
_extend_nested_dict(self.inner_validation_metrics, inner_validation_metrics)
_extend_nested_dict(self.outer_test_metrics, outer_test_metrics)
def _extend(self, other):
self.best.extend(other.best)
self.outer_kfold.extend(other.outer_kfold)
self.params.extend(other.params)
self.model.extend(other.model)
self.outer_test_indexes.extend(other.outer_test_indexes)
_extend_nested_dict(self.inner_validation_metrics, other.inner_validation_metrics)
_extend_nested_dict(self.outer_test_metrics, other.outer_test_metrics)
def _best_idx(self):
for idx, best in enumerate(self.best):
if best:
yield idx
def get_best_params(self, only_diff=False):
""" It outputs the parameters of all best models, to check to what extent they differ.
Args:
only_diff (bool): If True, only the parameters where thhere is a difference among all models
are output. Otherwise, all params are output.
Returns:
params (dict): A dictionary where the keys are the param names and the values are
lists of corresponding values (one per best model).
"""
best_idxs = [idx for idx in self._best_idx()]
params = defaultdict(list)
for idx in best_idxs:
for k in self.params[idx]:
params[k].append(self.params[idx][k])
if only_diff:
return {
k: v
for k, v in params.items()
if len(set(v)) > 1
}
else:
return dict(params)
def _filter_only_best(self, alist):
return [
alist[idx]
for idx in self._best_idx()
]
def get_outer_metrics_report(self):
""" It outputs a basic report of outer metrics. For each outer metric, it outputs
the mean, sd and range.
Returns:
metrics_report (dict): A nested dictionary where outer keys are metric names and
inner keys are `mean`, `sd`, `min`, `max`.
"""
metric_report = {}
for metric in self.outer_test_metrics:
metric_report[metric] = {
"mean": np.mean(self._filter_only_best(self.outer_test_metrics[metric])),
"sd": np.std(self._filter_only_best(self.outer_test_metrics[metric])),
"min": min(self._filter_only_best(self.outer_test_metrics[metric])),
"max": max(self._filter_only_best(self.outer_test_metrics[metric])),
}
return metric_report
def iter_models_test_idxs(self):
""" Iterator that yields all refitted models with their corresponding
outer test indexes, so that it's easy to compute other metrics or make
some custom plots to check the performance.
Yields:
t (tuple): a tuple containing model and test_idxs
"""
for t in zip(self._filter_only_best(self.model), self._filter_only_best(self.outer_test_indexes)):
yield t
|
get_best_params
get_best_params(only_diff=False)
It outputs the parameters of all best models, to check to what extent they differ.
Parameters:
Name |
Type |
Description |
Default |
only_diff |
bool
|
If True, only the parameters where thhere is a difference among all models
are output. Otherwise, all params are output. |
False
|
Returns:
Name | Type |
Description |
params |
dict
|
A dictionary where the keys are the param names and the values are
lists of corresponding values (one per best model). |
Source code in nestedcvtraining/utils/reporting.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123 | def get_best_params(self, only_diff=False):
""" It outputs the parameters of all best models, to check to what extent they differ.
Args:
only_diff (bool): If True, only the parameters where thhere is a difference among all models
are output. Otherwise, all params are output.
Returns:
params (dict): A dictionary where the keys are the param names and the values are
lists of corresponding values (one per best model).
"""
best_idxs = [idx for idx in self._best_idx()]
params = defaultdict(list)
for idx in best_idxs:
for k in self.params[idx]:
params[k].append(self.params[idx][k])
if only_diff:
return {
k: v
for k, v in params.items()
if len(set(v)) > 1
}
else:
return dict(params)
|
get_outer_metrics_report
get_outer_metrics_report()
It outputs a basic report of outer metrics. For each outer metric, it outputs
the mean, sd and range.
Returns:
Name | Type |
Description |
metrics_report |
dict
|
A nested dictionary where outer keys are metric names and
inner keys are mean , sd , min , max . |
Source code in nestedcvtraining/utils/reporting.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147 | def get_outer_metrics_report(self):
""" It outputs a basic report of outer metrics. For each outer metric, it outputs
the mean, sd and range.
Returns:
metrics_report (dict): A nested dictionary where outer keys are metric names and
inner keys are `mean`, `sd`, `min`, `max`.
"""
metric_report = {}
for metric in self.outer_test_metrics:
metric_report[metric] = {
"mean": np.mean(self._filter_only_best(self.outer_test_metrics[metric])),
"sd": np.std(self._filter_only_best(self.outer_test_metrics[metric])),
"min": min(self._filter_only_best(self.outer_test_metrics[metric])),
"max": max(self._filter_only_best(self.outer_test_metrics[metric])),
}
return metric_report
|
iter_models_test_idxs
Iterator that yields all refitted models with their corresponding
outer test indexes, so that it's easy to compute other metrics or make
some custom plots to check the performance.
Yields:
Name | Type |
Description |
t |
tuple
|
a tuple containing model and test_idxs |
Source code in nestedcvtraining/utils/reporting.py
149
150
151
152
153
154
155
156
157
158 | def iter_models_test_idxs(self):
""" Iterator that yields all refitted models with their corresponding
outer test indexes, so that it's easy to compute other metrics or make
some custom plots to check the performance.
Yields:
t (tuple): a tuple containing model and test_idxs
"""
for t in zip(self._filter_only_best(self.model), self._filter_only_best(self.outer_test_indexes)):
yield t
|
to_dataframe
It converts the information to a dataframe.
Returns:
Name | Type |
Description |
df |
dataframe
|
A dataframe with one row per trained model, with all attached information. |
Source code in nestedcvtraining/utils/reporting.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75 | def to_dataframe(self):
""" It converts the information to a dataframe.
Returns:
df (dataframe): A dataframe with one row per trained model, with all attached information.
"""
self_dict = copy.deepcopy(self.__dict__)
inner_validation_metrics = self_dict.pop("inner_validation_metrics")
outer_test_metrics = self_dict.pop("outer_test_metrics")
all_params = self_dict.pop("params")
all_params_keys = sorted(set(chain.from_iterable([[ik for ik in ks] for ks in all_params])))
for param_key in all_params_keys:
self_dict["param__" + param_key] = []
for param in all_params:
self_dict["param__" + param_key].append(param.get(param_key, None))
for k in inner_validation_metrics:
self_dict["inner_validation_metrics__" + k] = inner_validation_metrics[k]
for k in outer_test_metrics:
self_dict["outer_test_metrics__" + k] = outer_test_metrics[k]
return pd.DataFrame.from_dict(self_dict)
|