macrosynergy.learning.cv_tools#
A set of tools for cross-validation of panel data.
NOTE: This module is under development, and is not yet ready for production use.
- panel_cv_scores(X, y, splitter, estimators, scoring, show_longbias=True, show_std=False, verbose=1, n_jobs=-1)[source]#
Returns a dataframe of cross-validation scores.
- Parameters:
X (
DataFrame
) – Dataframe of features multi-indexed by (cross-section, date). The dataframe must be in wide format: each feature is a column. The dates must be in datetime format.y (
Union
[DataFrame
,Series
]) – Dataframe of the target variable, multi-indexed by (cross-section, date). The dates must be in datetime format.splitter (
BasePanelSplit
) – splitter object of a class inheriting from BasePanelSplit.estimators (
dict
) – dictionary of estimators, where the keys are the estimator names and the values are the sklearn estimator objects.scoring (
dict
) – dictionary of scoring metrics, where the keys are the metric names and the values are callables.show_longbias (
Optional
[bool
]) – boolean specifying whether or not to display the proportion of positive returns. Default is True.show_std (
Optional
[bool
]) – boolean specifying whether or not to show the standard deviation of the cross-validation scores. Default is False.verbose (
Optional
[int
]) – integer specifying verbosity of the cross-validation process. Default is 1.n_jobs (
Optional
[int
]) – integer specifying the number of jobs to run in parallel. Default is -1, which uses all cores.
- Return <pd.DataFrame> metrics_df:
dataframe comprising means & standard deviations of cross-validation metrics for each sklearn estimator, over the walk-forward history.
N.B.: The performance metrics dataframe returned is multi-indexed with the outer index representing a metric and the inner index representing the mean & standard deviation of the metric over the walk-forward validation splits. The columns are the estimators.