macrosynergy.learning.transformers#

Collection of custom scikit-learn transformer classes.

class ENetSelector(alpha=1.0, l1_ratio=0.5, positive=True)[source]#

Bases: BaseEstimator, SelectorMixin

fit(X, y)[source]#

Fit method to fit an Elastic Net regression and obtain the selected features.

Parameters:
  • X (DataFrame) – Pandas dataframe of input features.

  • y (Union[Series, DataFrame]) – Pandas series or dataframe of targets associated with each sample in X.

get_support(indices=False)[source]#

Method to return a mask, or integer index, of the features selected for the Pandas dataframe.

Parameters:

indices – Boolean to specify whether to return the column indices of the selected features instead of a boolean mask

Return <np.ndarray>:

Boolean mask or integer index of the selected features

get_feature_names_out()[source]#

Method to mask feature names according to selected features.

transform(X)[source]#

Transform method to return only the selected features of the dataframe.

Parameters:

X (DataFrame) – Pandas dataframe of input features.

Return <pd.DataFrame>:

Pandas dataframe of input features selected based on the Elastic Net’s feature selection capabilities.

class LassoSelector(alpha, positive=True)[source]#

Bases: BaseEstimator, SelectorMixin

fit(X, y)[source]#

Fit method to fit a Lasso regression and obtain the selected features.

Parameters:
  • X (DataFrame) – Pandas dataframe of input features.

  • y (Union[Series, DataFrame]) – Pandas series or dataframe of targets associated with each sample in X.

get_support(indices=False)[source]#

Method to return a mask, or integer index, of the features selected for the Pandas dataframe.

Parameters:

indices – Boolean to specify whether to return the column indices of the selected features instead of a boolean mask

Return <np.ndarray>:

Boolean mask or integer index of the selected features

get_feature_names_out()[source]#

Method to mask feature names according to selected features.

transform(X)[source]#

Transform method to return only the selected features of the dataframe.

Parameters:

X (DataFrame) – Pandas dataframe of input features.

Return <pd.DataFrame>:

Pandas dataframe of input features selected based on the Lasso’s feature selection capabilities.

class MapSelector(threshold=0.05, positive=False)[source]#

Bases: BaseEstimator, SelectorMixin

fit(X, y)[source]#

Fit method to assess significance of each feature using the Macrosynergy panel test.

Parameters:
  • X (DataFrame) – Pandas dataframe of input features.

  • y (Union[Series, DataFrame]) – Pandas series or dataframe of targets associated with each sample in X.

get_support(indices=False)[source]#

Method to return a mask, or integer index, of the features selected for the Pandas dataframe.

Parameters:

indices – Boolean to specify whether to return the column indices of the selected features instead of a boolean mask

Return <np.ndarray>:

Boolean mask or integer index of the selected features

get_feature_names_out()[source]#

Method to mask feature names according to selected features.

transform(X)[source]#

Transform method to return the significant training features.

Parameters:

X (DataFrame) – Pandas dataframe of input features.

Return <pd.DataFrame>:

Pandas dataframe of input features selected based on the Macrosynergy panel test.

class FeatureAverager(use_signs=False)[source]#

Bases: BaseEstimator, TransformerMixin

fit(X, y=None)[source]#

Fit method. Since this transformer is a simple averaging of features, no fitting is required.

Parameters:
  • X (DataFrame) – Pandas dataframe of input features.

  • y (Any) – Placeholder for scikit-learn compatibility.

transform(X)[source]#

Transform method to average features into a benchmark signal. If use_signs is True, the signs of the benchmark signal are returned.

Parameters:

X (DataFrame) – Pandas dataframe of input features.

Return <pd.DataFrame>:

Pandas dataframe of benchmark signal.

Return type:

DataFrame

class ZnScoreAverager(neutral='zero', use_signs=False)[source]#

Bases: BaseEstimator, TransformerMixin

fit(X, y=None)[source]#

Fit method to extract relevant standardisation/normalisation statistics from a training set so that PiT statistics can be computed in the transform method for a hold-out set.

Parameters:
  • X (DataFrame) – Pandas dataframe of input features.

  • y (Any) – Placeholder for scikit-learn compatibility.

transform(X)[source]#

Transform method to compute an out-of-sample benchmark signal for each unique date in the input test dataframe. At a given test time, the relevant statistics (implied by choice of neutral value) are calculated using all training information and test information until (and including) that test time, since the test time denotes the time at which the return was available and the features lag behind the returns.

Parameters:

X (DataFrame) – Pandas dataframe of input features.

class PanelMinMaxScaler[source]#

Bases: BaseEstimator, TransformerMixin, OneToOneFeatureMixin

Transformer class to extend scikit-learn’s MinMaxScaler() to panel datasets. It is intended to replicate the aforementioned class, but critically returning a Pandas dataframe or series instead of a numpy array. This preserves the multi-indexing in the inputs after transformation, allowing for the passing of standardised features into transformers that require cross-sectional and temporal knowledge.

NOTE: This class is designed to replicate scikit-learn’s MinMaxScaler() class.

It should primarily be used to satisfy the assumptions of various models.

fit(X, y=None)[source]#

Fit method to determine minimum and maximum values over a training set.

Parameters:
  • X – Pandas dataframe or series.

  • y – Placeholder for scikit-learn compatibility.

Return <PanelMinMaxScaler>:

Fitted PanelMinMaxScaler object.

transform(X)[source]#

Transform method to standardise a panel based on the minimum and maximum values.

Parameters:

X – Pandas dataframe or series.

Return <Union[pd.DataFrame, pd.Series]>:

Standardised dataframe or series.

class PanelStandardScaler(with_mean=True, with_std=True)[source]#

Bases: BaseEstimator, TransformerMixin, OneToOneFeatureMixin

fit(X, y=None)[source]#

Fit method to determine means and standard deviations over a training set.

Parameters:
  • X (Union[DataFrame, Series]) – Pandas dataframe or series.

  • y (Any) – Placeholder for scikit-learn compatibility.

Return <PanelStandardScaler>:

Fitted PanelStandardScaler object.

transform(X)[source]#

Transform method to standardise a panel based on the means and standard deviations learnt from a training set (and the fit method).

Parameters:

X (Union[DataFrame, Series]) – Pandas dataframe or series.

Return <Union[pd.DataFrame, pd.Series]>:

Standardised dataframe or series.