macrosynergy.management.utils.check_availability#
Module for checking the availability of data availabity from a Quantamental DataFrame. Includes functions for checking start years and end dates of a DataFrame, as well as visualizing the results.
- check_availability(df, xcats=None, cids=None, start=None, start_size=None, end_size=None, start_years=True, missing_recent=True, use_last_businessday=True)[source]#
Wrapper for visualizing start and end dates of a filtered DataFrame.
- Parameters:
df (
DataFrame
) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.xcats (
List
[str
]) – extended categories to be checked on. Default is all in the DataFrame.cids (
List
[str
]) – cross sections to be checked on. Default is all in the DataFrame.start (
str
) – string representing earliest considered date. Default is None.start_size (
Tuple
[float
]) – tuple of floats with width / length of the start years heatmap. Default is None (format adjusted to data).end_size (
Tuple
[float
]) – tuple of floats with width/length of the end dates heatmap. Default is None (format adjusted to data).start_years (
bool
) – boolean indicating whether or not to display a chart of starting years for each cross-section and indicator. Default is True (display start years).missing_recent (
bool
) – boolean indicating whether or not to display a chart of missing date numbers for each cross-section and indicator. Default is True (display missing days).use_last_businessday (
bool
) – boolean indicating whether or not to use the last business day before today as the end date. Default is True.
- missing_in_df(df, xcats=None, cids=None)[source]#
Print missing cross-sections and categories
- Parameters:
df (
QuantamentalDataFrame
) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.xcats (
List
[str
]) – extended categories to be checked on. Default is all in the DataFrame.cids (
List
[str
]) – cross sections to be checked on. Default is all in the DataFrame.
- check_startyears(df)[source]#
DataFrame with starting years across all extended categories and cross-sections
- Parameters:
df (
DataFrame
) – standardized DataFrame with the following necessary columns: ‘cid’, ‘xcat’, ‘real_date’.
- check_enddates(df)[source]#
DataFrame with end dates across all extended categories and cross sections.
- business_day_dif(df, maxdate)[source]#
Number of business days between two respective business dates.
- Parameters:
df (
DataFrame
) – DataFrame cross-sections rows and category columns. Each cell in the DataFrame will correspond to the start date of the respective series.maxdate (
Timestamp
) – maximum release date found in the received DataFrame. In principle, all series should have values up until the respective business date. The difference will represent possible missing values.
- Return <pd.DataFrame>:
DataFrame consisting of business day differences for all series.