macrosynergy.learning.cv_tools#

A set of tools for cross-validation of panel data.

NOTE: This module is under development, and is not yet ready for production use.

panel_cv_scores(X, y, splitter, estimators, scoring, show_longbias=True, show_std=False, verbose=1, n_jobs=-1)[source]#

Returns a dataframe of cross-validation scores.

Parameters:
  • X (DataFrame) – Dataframe of features multi-indexed by (cross-section, date). The dataframe must be in wide format: each feature is a column. The dates must be in datetime format.

  • y (Union[DataFrame, Series]) – Dataframe of the target variable, multi-indexed by (cross-section, date). The dates must be in datetime format.

  • splitter (BasePanelSplit) – splitter object of a class inheriting from BasePanelSplit.

  • estimators (dict) – dictionary of estimators, where the keys are the estimator names and the values are the sklearn estimator objects.

  • scoring (dict) – dictionary of scoring metrics, where the keys are the metric names and the values are callables.

  • show_longbias (Optional[bool]) – boolean specifying whether or not to display the proportion of positive returns. Default is True.

  • show_std (Optional[bool]) – boolean specifying whether or not to show the standard deviation of the cross-validation scores. Default is False.

  • verbose (Optional[int]) – integer specifying verbosity of the cross-validation process. Default is 1.

  • n_jobs (Optional[int]) – integer specifying the number of jobs to run in parallel. Default is -1, which uses all cores.

Return <pd.DataFrame> metrics_df:

dataframe comprising means & standard deviations of cross-validation metrics for each sklearn estimator, over the walk-forward history.

N.B.: The performance metrics dataframe returned is multi-indexed with the outer index representing a metric and the inner index representing the mean & standard deviation of the metric over the walk-forward validation splits. The columns are the estimators.