macrosynergy.learning.cv_tools#

A set of tools for cross-validation of panel data.

NOTE: This module is under development, and is not yet ready for production use.

panel_cv_scores(X, y, splitter, estimators, scoring, show_longbias=True, show_std=False, verbose=1, n_jobs=-1)[source]#

Returns a dataframe of cross-validation scores.

Parameters:

X (DataFrame) – Dataframe of features multi-indexed by (cross-section, date). The dataframe must be in wide format: each feature is a column. The dates must be in datetime format.
y (Union[DataFrame, Series]) – Dataframe of the target variable, multi-indexed by (cross-section, date). The dates must be in datetime format.
splitter (BasePanelSplit) – splitter object of a class inheriting from BasePanelSplit.
estimators (dict) – dictionary of estimators, where the keys are the estimator names and the values are the sklearn estimator objects.
scoring (dict) – dictionary of scoring metrics, where the keys are the metric names and the values are callables.
show_longbias (Optional[bool]) – boolean specifying whether or not to display the proportion of positive returns. Default is True.
show_std (Optional[bool]) – boolean specifying whether or not to show the standard deviation of the cross-validation scores. Default is False.
verbose (Optional[int]) – integer specifying verbosity of the cross-validation process. Default is 1.
n_jobs (Optional[int]) – integer specifying the number of jobs to run in parallel. Default is -1, which uses all cores.

Return <pd.DataFrame> metrics_df:

dataframe comprising means & standard deviations of cross-validation metrics for each sklearn estimator, over the walk-forward history.

N.B.: The performance metrics dataframe returned is multi-indexed with the outer index representing a metric and the inner index representing the mean & standard deviation of the metric over the walk-forward validation splits. The columns are the estimators.