macrosynergy.management.utils#
- get_cid(ticker)[source]#
Returns the cross-sectional identifier (cid) from a ticker.
Returns :return <str>: The cross-sectional identifier.
- get_xcat(ticker)[source]#
Returns the category (xcat) from a ticker.
Returns :return <str>: The category.
- Return type:
- split_ticker(ticker, mode)[source]#
Returns either the cross-sectional identifier (cid) or the category (xcat) from a ticker. The function is overloaded to accept either a single ticker or an iterable (e.g. list, tuple, pd.Series, np.array) of tickers.
- Parameters:
Returns :return <str>: The cross-sectional identifier or category.
- form_full_url(url, params={})[source]#
Forms a full URL from a base URL and a dictionary of parameters. Useful for logging and debugging.
- common_cids(df, xcats)[source]#
- Returns a list of cross-sectional identifiers (cids) for which the specified
categories (xcats) are available.
- Parameters:
- return <List[str]>: List of cross-sectional identifiers for which all categories in
xcats are available.
- generate_random_date(start='1990-01-01', end='2020-01-01')[source]#
Generates a random date between two dates.
- Parameters:
Returns :return <str>: The random date.
- Return type:
- get_dict_max_depth(d)[source]#
Returns the maximum depth of a dictionary.
- Parameters:
d (
dict
) – The dictionary to be searched.
Returns :return <int>: The maximum depth of the dictionary.
- Return type:
- rec_search_dict(d, key, match_substring=False, match_type=None)[source]#
Recursively searches a dictionary for a key and returns the value associated with it.
- Parameters:
d (
dict
) – The dictionary to be searched.key (
str
) – The key to be searched for.match_substring (
bool
) – If True, the function will return the value of the first key that contains the substring specified by the key parameter. If False, the function will return the value of the first key that matches the key parameter exactly. Default is False.match_type – If not None, the function will look for a key that matches the search parameters and has the specified type. Default is None.
- Return Any:
The value associated with the key, or None if the key is not found.
- standardise_dataframe(df, verbose=False)[source]#
Applies the standard JPMaQS Quantamental DataFrame format to a DataFrame.
- Parameters:
- Return <pd.DataFrame>:
The standardized DataFrame.
- Raises:
<TypeError> – If the input is not a pandas DataFrame.
<ValueError> – If the input DataFrame is not in the correct format.
- Return type:
- drop_nan_series(df, column='value', raise_warning=False)[source]#
Drops any series that are entirely NaNs. Raises a user warning if any series are dropped.
- Parameters:
- Return <pd.DataFrame | QuantamentalDataFrame>:
The cleaned DataFrame.
- Raises:
<TypeError> – If the input is not a pandas DataFrame.
<ValueError> – If the input DataFrame is not in the correct format.
- Return type:
- qdf_to_ticker_df(df, value_column='value')[source]#
Converts a standardized JPMaQS DataFrame to a wide format DataFrame with each column representing a ticker.
- Parameters:
df (
DataFrame
) – A standardised quantamental dataframe.value_column (
str
) – The column to be used as the value column, defaults to “value”. If the specified column is not present in the DataFrame, a column named “value” will be used. If there is no column named “value”, the first column in the DataFrame will be used instead.
- Return <pd.DataFrame>:
The converted DataFrame.
- Return type:
- ticker_df_to_qdf(df)[source]#
Converts a wide format DataFrame (with each column representing a ticker) to a standardized JPMaQS DataFrame.
- Parameters:
df (
DataFrame
) – A wide format DataFrame.- Return <pd.DataFrame>:
The converted DataFrame.
- Return type:
- apply_slip(df, slip, cids=None, xcats=None, tickers=None, metrics=['value'], raise_error=True)[source]#
Applies a slip, i.e. a negative lag, to the DataFrame for the given cross-sections and categories, on the given metrics.
- Parameters:
- Return <QuantamentalDataFrame> target_df:
DataFrame with the slip applied.
- Raises:
<TypeError> – If the provided parameters are not of the expected type.
<ValueError> – If the provided parameters are semantically incorrect.
- Return type:
- downsample_df_on_real_date(df, groupby_columns=[], freq='M', agg='mean')[source]#
Downsample JPMaQS DataFrame.
- Parameters:
df (
DataFrame
) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.groupby_columns (
List
[str
]) – a list of columns used to group the DataFrame.freq (
str
) – frequency option. Per default the correlations are calculated based on the native frequency of the datetimes in ‘real_date’, which is business daily. Downsampling options include weekly (‘W’), monthly (‘M’), or quarterly (‘Q’) mean.agg (
str
) – aggregation method. Must be one of “mean” (default), “median”, “min”, “max”, “first” or “last”.
- Return <pd.DataFrame>:
the downsampled DataFrame.
- update_df(df, df_add, xcat_replace=False)[source]#
Append a standard DataFrame to a standard base DataFrame with ticker replacement on the intersection.
- Parameters:
df (
DataFrame
) – standardised base JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.df_add (
DataFrame
) – another standardised JPMaQS DataFrame, with the latest values, to be added with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’, and ‘value’. Columns that are present in the base DataFrame but not in the appended DataFrame will be populated with NaN values.xcat_replace (
bool
) – all series belonging to the categories in the added DataFrame will be replaced, rather than just the added tickers. N.B.: tickers are combinations of cross-sections and categories.
- Return <pd.DataFrame>:
standardised DataFrame with the latest values of the modified or newly defined tickers added.
- update_categories(df, df_add)[source]#
Method used to update the DataFrame on the category level.
- Parameters:
df (
DataFrame
) – base DataFrame.df_add – appended DataFrame.
- reduce_df(df, xcats=None, cids=None, start=None, end=None, blacklist=None, out_all=False, intersect=False)[source]#
Filter DataFrame by xcats and cids and notify about missing xcats and cids.
- Parameters:
df (
DataFrame
) – standardized JPMaQS DataFrame with the necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and ‘value’.xcats (
Union
[str
,List
[str
]]) – extended categories to be filtered on. Default is all in the DataFrame.cids (
List
[str
]) – cross sections to be checked on. Default is all in the dataframe.start (
str
) – string representing the earliest date. Default is None.end (
str
) – string representing the latest date. Default is None.blacklist (
dict
) – cross-sections with date ranges that should be excluded from the data frame. If one cross-section has several blacklist periods append numbers to the cross-section code.out_all (
bool
) – if True the function returns reduced dataframe and selected/ available xcats and cids. Default is False, i.e. only the DataFrame is returnedintersect (
bool
) – if True only retains cids that are available for all xcats. Default is False.
- Return <pd.Dataframe>:
reduced DataFrame that also removes duplicates or (for out_all True) DataFrame and available and selected xcats and cids.
- reduce_df_by_ticker(df, ticks=None, start=None, end=None, blacklist=None)[source]#
Filter dataframe by xcats and cids and notify about missing xcats and cids
- Parameters:
df (
DataFrame
) – standardized dataframe with the following columns: ‘cid’, ‘xcat’, ‘real_date’.ticks (
List
[str
]) – tickers (cross sections + base categories)start (
str
) – string in ISO 8601 representing earliest date. Default is None.end (
str
) – string ISO 8601 representing the latest date. Default is None.blacklist (
dict
) – cross sections with date ranges that should be excluded from the dataframe. If one cross section has several blacklist periods append numbers to the cross section code.
- Return <pd.Dataframe>:
reduced dataframe that also removes duplicates
- categories_df(df, xcats, cids=None, val='value', start=None, end=None, blacklist=None, years=None, freq='M', lag=0, fwin=1, xcat_aggs=['mean', 'mean'])[source]#
In principle, create custom two-categories DataFrame with appropriate frequency and, if applicable, lags.
- Parameters:
df (
DataFrame
) – standardized JPMaQS DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’ and at least one column with values of interest.xcats (
List
[str
]) – extended categories involved in the custom DataFrame. The last category in the list represents the dependent variable, and the (n - 1) preceding categories will be the explanatory variables(s).cids (
List
[str
]) – cross-sections to be included. Default is all in the DataFrame.val (
str
) – name of column that contains the values of interest. Default is ‘value’.start (
str
) – earliest date in ISO 8601 format. Default is None, i.e. earliest date in DataFrame is used.end (
str
) – latest date in ISO 8601 format. Default is None, i.e. latest date in DataFrame is used.blacklist (
dict
) – cross-sections with date ranges that should be excluded from the DataFrame. If one cross section has several blacklist periods append numbers to the cross section code.years (
int
) – number of years over which data are aggregated. Supersedes the “freq” parameter and does not allow lags, Default is None, i.e. no multi-year aggregation.freq (
str
) – letter denoting frequency at which the series are to be sampled. This must be one of ‘D’, ‘W’, ‘M’, ‘Q’, ‘A’. Default is ‘M’. Will always be the last business day of the respective frequency.lag (
int
) – lag (delay of arrival) of explanatory category(s) in periods as set by freq. Default is 0.fwin (
int
) – forward moving average window of first category. Default is 1, i.e no average. Note: This parameter is used mainly for target returns as dependent variable.xcat_aggs (
List
[str
]) – exactly two aggregation methods. Default is ‘mean’ for both. The same aggregation method, the first method in the parameter, will be used for all explanatory variables.
- Return <pd.DataFrame>:
custom DataFrame with category columns. All rows that contain NaNs will be excluded.
N.B.: The number of explanatory categories that can be included is not restricted and will be appended column-wise to the returned DataFrame. The order of the DataFrame’s columns will reflect the order of the categories list.
- categories_df_aggregation_helper(dfx, xcat_agg)[source]#
Helper method to down-sample each category in the DataFrame by aggregating over the intermediary dates according to a prescribed method.
- weeks_btwn_dates(start_date, end_date)[source]#
Returns the number of business weeks between two dates.
- Return type:
- months_btwn_dates(start_date, end_date)[source]#
Returns the number of months between two dates.
- Return type:
- years_btwn_dates(start_date, end_date)[source]#
Returns the number of years between two dates.
- Return type:
- quarters_btwn_dates(start_date, end_date)[source]#
Returns the number of quarters between two dates.
- Return type:
- get_eops(dates=None, start_date=None, end_date=None, freq='M')[source]#
Returns a series of end-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.
- Parameters:
freq (
str
) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.dates (
Union
[DatetimeIndex
,Series
,Iterable
[Timestamp
],None
]) – The dates to be used to generate the end-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.start_date (
Union
[str
,Timestamp
,None
]) – The start date. Must be passed if dates is not passed.
- Return type:
- get_sops(dates=None, start_date=None, end_date=None, freq='M')[source]#
Returns a series of start-of-period dates for a given frequency. Dates can be passed as a series, index, a generic iterable or as a start and end date.
- Parameters:
freq (
str
) – The frequency string. Must be one of “D”, “W”, “M”, “Q”, “A”.dates (
Union
[DatetimeIndex
,Series
,Iterable
[Timestamp
],None
]) – The dates to be used to generate the start-of-period dates. Can be passed as a series, index, a generic iterable or as a start and end date.start_date (
Union
[str
,Timestamp
,None
]) – The start date. Must be passed if dates is not passed.
- Return type:
- expanding_mean_with_nan(dfw, absolute=False)[source]#
Computes a rolling median of a vector of floats and returns the results. NaNs will be consumed.
- Parameters:
- Return <List[float] ret:
a list containing the median values. The number of computed median values held inside the list will correspond to the number of timestamps the series is defined over.
- Return type:
List
[float64
]
Submodules#
- macrosynergy.management.utils.check_availability
- macrosynergy.management.utils.core
- macrosynergy.management.utils.df_utils
standardise_dataframe()
drop_nan_series()
qdf_to_ticker_df()
ticker_df_to_qdf()
apply_slip()
downsample_df_on_real_date()
update_df()
update_tickers()
update_categories()
reduce_df()
reduce_df_by_ticker()
categories_df_aggregation_helper()
categories_df()
years_btwn_dates()
quarters_btwn_dates()
months_btwn_dates()
weeks_btwn_dates()
get_eops()
get_sops()
- macrosynergy.management.utils.math