macrosynergy.panel.category_relations#

Classes and functions for analyzing and visualizing the relations of two panel categories.

class CategoryRelations(df, xcats, cids=None, val='value', start=None, end=None, blacklist=None, years=None, freq='M', lag=0, fwin=1, xcat_aggs=['mean', 'mean'], xcat1_chg=None, n_periods=1, xcat_trims=[None, None], slip=0)[source]#

Bases: object

W Class for analyzing and visualizing the relations of multiple panel categories.

Parameters:

df (DataFrame) – standardized DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.
xcats (List[str]) – exactly two extended categories to be analyzed. If there is a hypothesized explanatory-dependent relation, the first category is the explanatory variable and the second category the explained variable.
cids (List[str]) – cross-sections for which the category relations is being analyzed. Default is all in the DataFrame.
start (str) – earliest date in ISO format. Default is None in which case the earliest date in the DataFrame will be used.
end (str) – latest date in ISO format. Default is None in which case the latest date in the DataFrame will be used.
blacklist (dict) – cross-sections with date ranges that should be excluded from the analysis.
years (int) – number of years over which data are aggregated. Supersedes the ‘freq’ parameter and does not allow lags, Default is None, meaning no multi-year aggregation. Note: for single year labelled plots, better use freq=’A’ for cleaner labels.
val (str) – name of column that contains the values of interest. Default is ‘value’.
freq (str) – letter denoting frequency at which the series are to be sampled. This must be one of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is ‘M’.
lag (int) – lag (delay of arrival) of first (explanatory) category in periods as set by freq. Default is 0. Importantly, for analyses with explanatory and dependent categories, the first category takes the role of the explanatory and a positive lag means that the explanatory values will be deferred into the future, i.e. relate to future values of the explained variable.
xcat_aggs (List[str]) – Exactly two aggregation methods. Default is ‘mean’ for both.
xcat1_chg (str) – time series changes are applied to the first category. Default is None. Options are ‘diff’ (first difference) and ‘pch’ (percentage change). The changes are calculated over the number of periods determined by n_periods.
n_periods (int) – number of periods over which changes of the first category have been calculated. Default is 1.
fwin (int) – forward moving average window of second category. Default is 1, i.e no average. Importantly, for analysis with explanatory and dependent categories, the second takes the role of the dependent and a forward window means that the dependent values average forward into the future.
xcat_trims (List[float]) – two-element list with maximum absolute values for the two respective categories. Observations with higher values will be trimmed, i.e. removed from the analysis (not winsorized!). Default is None for both. Trimming is applied after all other transformations.
slip (int) – implied slippage of feature availability for relationship with the target category. This mimics the relationship between trading signals and returns, which is often characterized by a delay due to the setup of positions. Technically, this is a negative lag (early arrival) of the target category in working days prior to any frequency conversion. Default is 0.

classmethod intersection_cids(df, xcats, cids)[source]#

Returns common cross-sections across both categories and specified parameter.

Parameters:

df – standardised DataFrame.
xcats – exactly two extended categories to be checked on.
cids – cross-sections for which the category relation is being

analyzed.

Return <List[str]>:: usable: List of the common cross-sections across the two categories.

static apply_slip(df, slip, cids, xcats, metrics)[source]#

Return type:: DataFrame

classmethod time_series(df, change, n_periods, shared_cids, expln_var)[source]#

Calculates first differences and percent changes.

Parameters:

df (DataFrame) – multi-index DataFrame hosting the two categories: first column represents the explanatory variable; second column hosts the dependent variable. The DataFrame’s index is the real-date and cross-section.
change (str) – type of change to be applied
n_periods (int) – number of base periods in df over which the change is applied.
shared_cids (List[str]) – shared cross-sections across the two categories and the received list.
expln_var (str) – only the explanatory variable’s data series will be changed from the raw value series to a difference or percentage change value.

Return <pd.Dataframe>:

df: returns the same multi-index DataFrame but with an adjusted series inline with the ‘change’ parameter.

classmethod outlier_trim(df, xcats, xcat_trims)[source]#

Trim outliers from the dataset.

Parameters:

df (DataFrame) – multi-index DataFrame hosting the two categories. The transformations, to each series, have already been applied.
xcats (List[str]) – explanatory and dependent variable.
xcat_trims (List[float]) –

Return <pd.DataFrame> df:

returns the same multi-index DataFrame.

N.B.: Outliers are classified as any datapoint whose absolute value exceeds the predefined value specified in the field self.xcat_trims. The values will be set to NaN, and subsequently excluded from any regression modelling or correlation coefficients.

corr_prob_calc(df_probability, prob_est)[source]#

Compute the correlation coefficient and probability statistics.

Parameters:

df_probability (Union[DataFrame, List[DataFrame]]) – pandas DataFrame containing the dependent and explanatory variables.
prob_est (str) – type of estimator for probability of significant relation.

Return <List[tuple(float, float)]>:

N.B.: The method is able to handle multiple DataFrames, and will return the corresponding number of statistics held inside a List.

corr_probability(df_probability, prob_est, time_period='', coef_box_loc='upper left', ax=None)[source]#

Add the computed correlation coefficient and probability to a Matplotlib table.

Parameters:

df_probability (Union[DataFrame, List[DataFrame]]) – pandas DataFrame containing the dependent and explanatory variables. Able to handle multiple DataFrames representing different time-periods of the original series.
prob_est (str) – type of estimator for probability of significant relation.
time_period (str) – indicator used to clarify which time-period the statistics are computed for. For example, before 2010 and after 2010: the two periods experience very different macroeconomic conditions. The default is an empty string.
coef_box_loc (str) – location on the graph of the aforementioned box. The default is in the upper left corner.
prob_bool – boolean parameter which determines whether the probability value is included in the table. The default is True.
ax (Axes) – Matplotlib Axes object. If None (default), new axes will be created.

annotate_facet(data, **kws)[source]#: Annotate each graph within the facet grid.

reg_scatter(title=None, labels=False, size=None, xlab=None, ylab=None, coef_box=None, coef_box_font_size=0, prob_est='pool', fit_reg=True, reg_ci=95, reg_order=1, reg_robust=False, separator=None, title_adj=1, single_chart=False, single_scatter=False, ncol=None, ax=None)[source]#

Display scatter-plot and regression line.

Parameters:

title (str) – title of plot. If None (default) an informative title is applied.
labels (bool) – assign a cross-section/period label to each dot. Default is False.
size (Tuple[float]) – width and height of the figure
xlab (str) – x-axis label. Default is no label.
ylab (str) – y-axis label. Default is no label.
fit_reg (bool) – if True (default) adds a regression line.
reg_ci (int) – size of the confidence interval for the regression estimate. Default is 95. Can be None.
reg_order (int) – order of the regression equation. Default is 1 (linear).
reg_robust (bool) – if this will de-weight outliers, which is computationally expensive. Default is False.
coef_box (str) – two-purpose parameter. Firstly, if the parameter equals None, the correlation coefficient and probability statistics will not be included in the graphic. Secondly, if the statistics are to be included, pass in the desired location on the graph which, in addition, will act as a pseudo-boolean parameter. The options are standard, i.e. ‘upper left’, ‘lower right’ and so forth. Default is None, i.e the statistics are not displayed.
prob_est (str) – type of estimator for probability of significant relation. The default is “pool”, which means that all observation pairs of a panel are pooled and the probability is based on that pool. The alternative is “map”, denoting Macrosynergy panel test. This is based on a panel regression with period-specific random effects and greatly mitigates the issue of pseudo-replication if panel features and targets are correlated across time. See also https://research.macrosynergy.com/testing-macro-trading-factors/
separator (Union[str, int]) – allows categorizing the scatter analysis by cross-section or integer. In the former case the argument is set to “cids” and in the latter case the argument is set to a year [2010, for instance] which will subsequently split the time-period into the sample before (not including) that year and from (including) that year.
title_adj (float) – parameter that sets top of figure to accommodate title. Default is 1.
single_chart (bool) – boolean parameter determining whether the x- and y- labels are only written on a single graph of the Facet Grid (useful if there are numerous charts, and the labels are excessively long). The default is False, and the names of the axis will be displayed on each grid if not conflicting with the label for each variable.
ncol (int) – number of columns in the facet grid. Default is None, in which case the number of columns is determined by the number of cross-sections.
ax (Axes) – Matplotlib Axes object. If None (default), new figure and axes objects will be created. If an Axes object is passed, the plot will be drawn on the Axes, and plt.show() will not be called.

ols_table(type='pool')[source]#: Print statsmodels regression summaries. :param type: type of linear regression summary to print. Default is ‘pool’.

Alternative is ‘re’ for period-specific random effects.