macrosynergy.visuals.correlation#

Functions used to visualize correlations across categories or cross-sections of panels.

view_correlation(df, xcats=None, cids=None, xcats_secondary=None, cids_secondary=None, start='2000-01-01', end=None, val='value', freq=None, cluster=False, lags=None, lags_secondary=None, title=None, size=(14, 8), max_color=None, show=True, **kwargs)[source]#

Visualize correlation across categories or cross-sections of panels.

Parameters:
  • df (DataFrame) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.

  • xcats (Union[str, List[str]]) – extended categories to be correlated. Default is all in the DataFrame. If xcats contains only one category the correlation coefficients across cross sections are displayed. If xcats contains more than one category, the correlation coefficients across categories are displayed. Additionally, the order of the xcats received will be mirrored in the correlation matrix.

  • cids (List[str]) – cross sections to be correlated. Default is all in the DataFrame.

  • xcats_secondary (Union[str, List[str], None]) – an optional second set of extended categories. If xcats_secondary is provided, correlations will be calculated between the categories in xcats and xcats_secondary.

  • cids_secondary (Optional[List[str]]) – an optional second list of cross sections. If cids_secondary is provided correlations will be calculated and visualized between these two sets.

  • start (str) – earliest date in ISO format. Default is None and earliest date in df is used.

  • end (str) – latest date in ISO format. Default is None and latest date in df is used.

  • val (str) – name of column that contains the values of interest. Default is ‘value’.

  • freq (str) – frequency option. Per default the correlations are calculated based on the native frequency of the datetimes in ‘real_date’, which is business daily. Down-sampling options include weekly (‘W’), monthly (‘M’), or quarterly (‘Q’) mean.

  • cluster (bool) – if True the series in the correlation matrix are reordered by hierarchical clustering. Default is False.

  • lags (dict) – optional dictionary of lags applied to respective categories. The key will be the category and the value is the lag or lags. If a category has multiple lags applied, pass in a list of lag values. The lag factor will be appended to the category name in the correlation matrix. If xcats_secondary is not none, this parameter will specify lags for the categories in xcats. N.B.: Lags can include a 0 if the original should also be correlated.

  • lags_secondary (Optional[dict]) – optional dictionary of lags applied to the second set of categories if xcats_secondary is provided.

  • title (str) – chart heading. If none is given, a default title is used.

  • size (Tuple[float]) – two-element tuple setting width/height of figure. Default is (14, 8).

  • max_color (float) – maximum values of positive/negative correlation coefficients for color scale. Default is none. If a value is given it applies symmetrically to positive and negative values.

  • show (bool) – if True the figure will be displayed. Default is True.

  • **kwargs

    Arbitrary keyword arguments that are passed to seaborn.heatmap :param **kwargs: Arbitrary keyword arguments that are passed to seaborn.heatmap.

N.B:. The function displays the heatmap of a correlation matrix across categories or cross-sections (depending on which parameter has received multiple elements).

lag_series(df_w, lags, xcats)[source]#

Method used to lag respective categories.

Parameters:
  • df_w (DataFrame) – multi-index DataFrame where the columns are the categories, and the two indices are the cross-sections and real-dates.

  • lags (dict) – dictionary of lags applied to respective categories.

  • xcats (List[str]) – extended categories to be correlated.

Return type:

Tuple[DataFrame, Dict[str, List[str]]]