macrosynergy.management.utils#

get_cid(ticker)[source]#

Returns the cross-sectional identifier (cid) from a ticker.

Parameters:

ticker (Union[str, Iterable[str]]) – The ticker to be converted.

Returns :return <str>: The cross-sectional identifier.

Return type:

Union[str, List[str]]

get_xcat(ticker)[source]#

Returns the category (xcat) from a ticker.

Parameters:

ticker (Union[str, Iterable[str]]) – The ticker to be converted.

Returns :return <str>: The category.

Return type:

str

split_ticker(ticker, mode)[source]#

Returns either the cross-sectional identifier (cid) or the category (xcat) from a ticker. The function is overloaded to accept either a single ticker or an iterable (e.g. list, tuple, pd.Series, np.array) of tickers.

Parameters:
  • ticker (Union[str, Iterable[str]]) – The ticker to be converted.

  • mode (str) – The mode to be used. Must be either “cid” or “xcat”.

Returns :return <str>: The cross-sectional identifier or category.

Return type:

Union[str, List[str]]

is_valid_iso_date(date)[source]#
Return type:

bool

convert_iso_to_dq(date)[source]#
Return type:

str

convert_dq_to_iso(date)[source]#
Return type:

str

form_full_url(url, params={})[source]#

Forms a full URL from a base URL and a dictionary of parameters. Useful for logging and debugging.

Parameters:
  • url (str) – base URL.

  • params (Dict) – dictionary of parameters.

Return <str>:

full URL

Return type:

str

common_cids(df, xcats)[source]#
Returns a list of cross-sectional identifiers (cids) for which the specified

categories (xcats) are available.

Parameters:
  • df (DataFrame) – Standardized JPMaQS DataFrame with necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.

  • xcats (List[str]) – A list with least two categories whose cross-sectional identifiers are being considered.

return <List[str]>: List of cross-sectional identifiers for which all categories in

xcats are available.

generate_random_date(start='1990-01-01', end='2020-01-01')[source]#

Generates a random date between two dates.

Parameters:

Returns :return <str>: The random date.

Return type:

str

get_dict_max_depth(d)[source]#

Returns the maximum depth of a dictionary.

Parameters:

d (dict) – The dictionary to be searched.

Returns :return <int>: The maximum depth of the dictionary.

Return type:

int

rec_search_dict(d, key, match_substring=False, match_type=None)[source]#

Recursively searches a dictionary for a key and returns the value associated with it.

Parameters:
  • d (dict) – The dictionary to be searched.

  • key (str) – The key to be searched for.

  • match_substring (bool) – If True, the function will return the value of the first key that contains the substring specified by the key parameter. If False, the function will return the value of the first key that matches the key parameter exactly. Default is False.

  • match_type – If not None, the function will look for a key that matches the search parameters and has the specified type. Default is None.

Return Any:

The value associated with the key, or None if the key is not found.

class Timer[source]#

Bases: object

timer()[source]#
Return type:

Tuple[float, float]

lap()[source]#
Return type:

float

check_package_version(required_version)[source]#
standardise_dataframe(df, verbose=False)[source]#

Applies the standard JPMaQS Quantamental DataFrame format to a DataFrame.

Parameters:
  • df (DataFrame) – The DataFrame to be standardized.

  • verbose (bool) – Whether to print warnings if the DataFrame is not in the correct format.

Return <pd.DataFrame>:

The standardized DataFrame.

Raises:
  • <TypeError> – If the input is not a pandas DataFrame.

  • <ValueError> – If the input DataFrame is not in the correct format.

Return type:

QuantamentalDataFrame

drop_nan_series(df, column='value', raise_warning=False)[source]#

Drops any series that are entirely NaNs. Raises a user warning if any series are dropped.

Parameters:
  • df (DataFrame) – The dataframe to be cleaned.

  • column (str) – The column to be used as the value column, defaults to “value”.

  • raise_warning (bool) – Whether to raise a warning if any series are dropped.

Return <pd.DataFrame | QuantamentalDataFrame>:

The cleaned DataFrame.

Raises:
  • <TypeError> – If the input is not a pandas DataFrame.

  • <ValueError> – If the input DataFrame is not in the correct format.

Return type:

QuantamentalDataFrame

qdf_to_ticker_df(df, value_column='value')[source]#

Converts a standardized JPMaQS DataFrame to a wide format DataFrame with each column representing a ticker.

Parameters:
  • df (DataFrame) – A standardised quantamental dataframe.

  • value_column (str) – The column to be used as the value column, defaults to “value”. If the specified column is not present in the DataFrame, a column named “value” will be used. If there is no column named “value”, the first column in the DataFrame will be used instead.

Return <pd.DataFrame>:

The converted DataFrame.

Return type:

DataFrame

ticker_df_to_qdf(df)[source]#

Converts a wide format DataFrame (with each column representing a ticker) to a standardized JPMaQS DataFrame.

Parameters:

df (DataFrame) – A wide format DataFrame.

Return <pd.DataFrame>:

The converted DataFrame.

Return type:

QuantamentalDataFrame

apply_slip(df, slip, cids=None, xcats=None, tickers=None, metrics=['value'], raise_error=True)[source]#

Applies a slip, i.e. a negative lag, to the DataFrame for the given cross-sections and categories, on the given metrics.

Parameters:
  • target_df – DataFrame to which the slip is applied.

  • slip (int) – Slip to be applied.

  • cids (Optional[List[str]]) – List of cross-sections.

  • xcats (Optional[List[str]]) – List of target categories.

  • metrics (List[str]) – List of metrics to which the slip is applied.

Return <QuantamentalDataFrame> target_df:

DataFrame with the slip applied.

Raises:
  • <TypeError> – If the provided parameters are not of the expected type.

  • <ValueError> – If the provided parameters are semantically incorrect.

Return type:

QuantamentalDataFrame

downsample_df_on_real_date(df, groupby_columns=[], freq='M', agg='mean')[source]#

Downsample JPMaQS DataFrame.

Parameters:
  • df (DataFrame) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.

  • groupby_columns (List[str]) – a list of columns used to group the DataFrame.

  • freq (str) – frequency option. Per default the correlations are calculated based on the native frequency of the datetimes in ‘real_date’, which is business daily. Downsampling options include weekly (‘W’), monthly (‘M’), or quarterly (‘Q’) mean.

  • agg (str) – aggregation method. Must be one of “mean” (default), “median”, “min”, “max”, “first” or “last”.

Return <pd.DataFrame>:

the downsampled DataFrame.

update_df(df, df_add, xcat_replace=False)[source]#

Append a standard DataFrame to a standard base DataFrame with ticker replacement on the intersection.

Parameters:
  • df (DataFrame) – standardised base JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.

  • df_add (DataFrame) – another standardised JPMaQS DataFrame, with the latest values, to be added with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’, and ‘value’. Columns that are present in the base DataFrame but not in the appended DataFrame will be populated with NaN values.

  • xcat_replace (bool) – all series belonging to the categories in the added DataFrame will be replaced, rather than just the added tickers. N.B.: tickers are combinations of cross-sections and categories.

Return <pd.DataFrame>:

standardised DataFrame with the latest values of the modified or newly defined tickers added.

update_tickers(df, df_add)[source]#

Method used to update aggregate DataFrame on a ticker level.

Parameters:
  • df (DataFrame) – aggregate DataFrame used to store all tickers.

  • df_add (DataFrame) – DataFrame with the latest values.

update_categories(df, df_add)[source]#

Method used to update the DataFrame on the category level.

Parameters:
  • df (DataFrame) – base DataFrame.

  • df_add – appended DataFrame.

reduce_df(df, xcats=None, cids=None, start=None, end=None, blacklist=None, out_all=False, intersect=False)[source]#

Filter DataFrame by xcats and cids and notify about missing xcats and cids.

Parameters:
  • df (DataFrame) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.

  • xcats (Union[str, List[str]]) – extended categories to be filtered on. Default is all in the DataFrame.

  • cids (List[str]) – cross sections to be checked on. Default is all in the dataframe.

  • start (str) – string representing the earliest date. Default is None.

  • end (str) – string representing the latest date. Default is None.

  • blacklist (dict) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.

  • out_all (bool) – if True the function returns reduced dataframe and selected/ available xcats and cids. Default is False, i.e. only the DataFrame is returned

  • intersect (bool) – if True only retains cids that are available for all xcats. Default is False.

Return <pd.Dataframe>:

reduced DataFrame that also removes duplicates or (for out_all True) DataFrame and available and selected xcats and cids.

reduce_df_by_ticker(df, ticks=None, start=None, end=None, blacklist=None)[source]#

Filter dataframe by xcats and cids and notify about missing xcats and cids

Parameters:
  • df (DataFrame) – standardized dataframe with the following columns: ‘cid’, ‘xcat’, ‘real_date’.

  • ticks (List[str]) – tickers (cross sections + base categories)

  • start (str) – string in ISO 8601 representing earliest date. Default is None.

  • end (str) – string ISO 8601 representing the latest date. Default is None.

  • blacklist (dict) – cross sections with date ranges that should be excluded from the dataframe. If one cross section has several blacklist periods append numbers to the cross section code.

Return <pd.Dataframe>:

reduced dataframe that also removes duplicates

categories_df(df, xcats, cids=None, val='value', start=None, end=None, blacklist=None, years=None, freq='M', lag=0, fwin=1, xcat_aggs=['mean', 'mean'])[source]#

In principle, create custom two-categories DataFrame with appropriate frequency and, if applicable, lags.

Parameters:
  • df (DataFrame) – standardized JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.

  • xcats (List[str]) – extended categories involved in the custom DataFrame. The last category in the list represents the dependent variable, and the (n - 1) preceding categories will be the explanatory variables(s).

  • cids (List[str]) – cross-sections to be included. Default is all in the DataFrame.

  • val (str) – name of column that contains the values of interest. Default is ‘value’.

  • start (str) – earliest date in ISO 8601 format. Default is None, i.e. earliest date in DataFrame is used.

  • end (str) – latest date in ISO 8601 format. Default is None, i.e. latest date in DataFrame is used.

  • blacklist (dict) – cross-sections with date ranges that should be excluded from the DataFrame. If one cross section has several blacklist periods append numbers to the cross section code.

  • years (int) – number of years over which data are aggregated. Supersedes the “freq” parameter and does not allow lags, Default is None, i.e. no multi-year aggregation.

  • freq (str) – letter denoting frequency at which the series are to be sampled. This must be one of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is ‘M’. Will always be the last business day of the respective frequency.

  • lag (int) – lag (delay of arrival) of explanatory category(s) in periods as set by freq. Default is 0.

  • fwin (int) – forward moving average window of first category. Default is 1, i.e no average. Note: This parameter is used mainly for target returns as dependent variable.

  • xcat_aggs (List[str]) – exactly two aggregation methods. Default is ‘mean’ for both. The same aggregation method, the first method in the parameter, will be used for all explanatory variables.

Return <pd.DataFrame>:

custom DataFrame with category columns. All rows that contain NaNs will be excluded.

N.B.: The number of explanatory categories that can be included is not restricted and will be appended column-wise to the returned DataFrame. The order of the DataFrame’s columns will reflect the order of the categories list.

categories_df_aggregation_helper(dfx, xcat_agg)[source]#

Helper method to down-sample each category in the DataFrame by aggregating over the intermediary dates according to a prescribed method.

Parameters:
  • dfx (DataFrame) – standardised DataFrame defined exclusively on a single category.

  • xcat_agg (str) – associated aggregation method for the respective category.

weeks_btwn_dates(start_date, end_date)[source]#

Returns the number of business weeks between two dates.

Return type:

int

months_btwn_dates(start_date, end_date)[source]#

Returns the number of months between two dates.

Return type:

int

years_btwn_dates(start_date, end_date)[source]#

Returns the number of years between two dates.

Return type:

int

quarters_btwn_dates(start_date, end_date)[source]#

Returns the number of quarters between two dates.

Return type:

int

get_eops(dates=None, start_date=None, end_date=None, freq='M')[source]#

Returns a series of end-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.

Parameters:
  • freq (str) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.

  • dates (Union[DatetimeIndex, Series, Iterable[Timestamp], None]) – The dates to be used to generate the end-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.

  • start_date (Union[str, Timestamp, None]) – The start date. Must be passed if dates is not passed.

Return type:

Series

get_sops(dates=None, start_date=None, end_date=None, freq='M')[source]#

Returns a series of start-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.

Parameters:
  • freq (str) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.

  • dates (Union[DatetimeIndex, Series, Iterable[Timestamp], None]) – The dates to be used to generate the start-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.

  • start_date (Union[str, Timestamp, None]) – The start date. Must be passed if dates is not passed.

Return type:

Series

expanding_mean_with_nan(dfw, absolute=False)[source]#

Computes a rolling median of a vector of floats and returns the results. NaNs will be consumed.

Parameters:
  • dfw (DataFrame) – “wide” dataframe with time index and cross-sections as columns.

  • absolute (bool) – if True, the rolling mean will be computed on the magnitude of each value. Default is False.

Return <List[float] ret:

a list containing the median values. The number of computed median values held inside the list will correspond to the number of timestamps the series is defined over.

Return type:

List[float64]

Submodules#