macrosynergy.panel.make_zn_scores#

Module for calculating z-scores for a panel around a neutral level (“zn scores”).

expanding_stat(df, dates_iter, stat='mean', sequential=True, min_obs=261, iis=True)[source]#

Compute statistic based on an expanding sample.

Parameters:

df (DataFrame) – Daily-frequency time series DataFrame.
dates_iter (DatetimeIndex) – controls the frequency of the neutral & mean absolute deviation calculations.
stat (Union[str, Number]) – statistical method to be applied. This is typically ‘mean’, or ‘median’.
sequential (bool) – if True (default) the statistic is estimated sequentially. If this set to false a single value is calculated per time series, based on the full sample.
min_obs (int) – minimum required observations for calculation of the statistic in days.
iis (bool) – if set to True, the values of the initial interval determined by min_obs will be estimated in-sample, based on the full initial sample.

Return <pd.DataFrame> df_out:

Time series dataframe of the chosen statistic across all columns

Return type:

DataFrame

make_zn_scores(df, xcat, cids=None, start=None, end=None, blacklist=None, sequential=True, min_obs=261, iis=True, neutral='zero', est_freq='D', thresh=None, pan_weight=1, postfix='ZN')[source]#

Computes z-scores for a panel around a neutral level (“zn scores”).

Parameters:

df (DataFrame) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.
xcat (str) – extended category for which the zn_score is calculated.
cids (List[str]) – cross sections for which zn_scores are calculated; default is all available for category.
start (str) – earliest date in ISO format. Default is None and earliest date in df is used.
end (str) – latest date in ISO format. Default is None and latest date in df is used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the calculation of zn-scores. This means that not only are there no zn-score values calculated for these periods, but also that they are not used for the scoring of other periods. N.B.: The argument is a dictionary with cross-sections as keys and tuples of start and end dates of the blacklist periods in ISO formats as values. If one cross section has multiple blacklist periods, numbers are added to the keys (i.e. TRY_1, TRY_2, etc.)
sequential (bool) – if True (default) score parameters (neutral level and mean absolute deviation) are estimated sequentially with concurrently available information only.
min_obs (int) – the minimum number of observations required to calculate zn_scores. Default is 261. The parameter is only applicable if the “sequential” parameter is set to True. Otherwise the neutral level and the mean absolute deviation are both computed in-sample and will use the full sample.
iis (bool) – if True (default) zn-scores are also calculated for the initial sample period defined by min-obs on an in-sample basis to avoid losing history. This is irrelevant if sequential is set to False.
neutral (Union[str, Number]) – method to determine neutral level. Default is ‘zero’. Alternatives are ‘mean’, ‘median’ or a number.
est_freq (str) – the frequency at which mean absolute deviations or means are are re-estimated. The options are daily, weekly, monthly & quarterly: “D”, “W”, “M”, “Q”. Default is daily. Re-estimation is performed at period end.
thresh (float) – threshold value beyond which scores are winsorized, i.e. contained at that threshold. The threshold is the maximum absolute score value that the function is allowed to produce. The minimum threshold is 1 mean absolute deviation.
pan_weight (float) – weight of panel (versus individual cross section) for calculating the z-score parameters, i.e. the neutral level and the mean absolute deviation. Default is 1, i.e. panel data are the basis for the parameters. Lowest possible value is 0, i.e. parameters are all specific to cross section.
postfix (str) – string appended to category name for output; default is “ZN”.

Return <pd.Dataframe>:

standardized DataFrame with the zn-scores of the chosen xcat: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.

Return type:

DataFrame