macrosynergy.management.simulate#

make_qdf(df_cids, df_xcats, back_ar=0)[source]#

Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’.

Parameters:
  • df_cids (DataFrame) –

    DataFrame with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard

    deviation.

  • df_xcats (DataFrame) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set auto-correlation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to category values.

  • back_ar (float) – float between 0 and 1 denoting set auto-correlation of the background factor. Default is zero.

Return <pd.DataFrame>:

basic quantamental DataFrame according to specifications.

make_test_df(cids=['AUD', 'CAD', 'GBP'], xcats=['XR', 'CRY'], start='2010-01-01', end='2020-12-31', style='any')[source]#

Generates a test dataframe with pre-defined values. These values are meant to be used for testing purposes only. The functions generates a standard quantamental dataframe with where the value column is populated with pre-defined values. These values are simple lines, or waves that are easy to identify and differentiate in a plot.

Parameters

Parameters:
  • cids (List[str]) – A list of strings for cids.

  • xcats (List[str]) – A list of strings for xcats.

  • start_date – An ISO-formatted date string.

  • end_date – An ISO-formatted date string.

  • style (str) – A string that specifies the type of line to generate. Current choices are: ‘linear’, ‘decreasing-linear’, ‘sharp-hill’, ‘four-bit-sine’, ‘sine’, ‘cosine’, ‘sawtooth’, ‘any’. See macrosynergy.management.simulate.simulate_quantamental_data.generate_lines().

Return type:

QuantamentalDataFrame

dataframe_generator(df_cids, df_xcats, cid, xcat)[source]#

Adjacent method used to construct the quantamental DataFrame.

Parameters:
  • df_cids (DataFrame) –

  • df_xcats (DataFrame) –

  • cid (str) – individual cross-section.

  • xcat (str) – individual category.

Return <Tuple[pd.DataFrame, pd.DatetimeIndex]>:

Tuple containing the quantamental DataFrame and a DatetimeIndex of the business days.

generate_lines(sig_len, style='linear')[source]#

Returns a numpy array of a line with a given length.

Parameters :type sig_len: int :param sig_len: The number of elements in the returned array. :type style: str :param style: The style of the line. Default ‘linear’. Current choices are:

linear, decreasing-linear, sharp-hill, four-bit-sine, sine, cosine, sawtooth. Adding “inv” or “inverted” to the style will return the inverted version of that line. For example, ‘inv-sawtooth’ or ‘inverted sawtooth’ will return the inverted sawtooth line. ‘any’ will return a random line. ‘all’ will return a list of all the available styles.

Return <Union[np.ndarray, List[str]]>:

A numpy array of the line. If style is ‘all’, then a list (of strings) of all the available styles is returned.

NOTE: It is indeed request an “inverted linear” or “inverted decreasing-linear” line. They’re just there for completeness and readability.

Return type:

Union[ndarray, List[str]]

make_qdf_black(df_cids, df_xcats, blackout)[source]#

Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’. In this DataFrame the column, ‘value’, will consist of Binary Values denoting whether the cross-section is active for the corresponding dates.

Parameters:

df_cids (DataFrame) – dataframe with parameters by cid. Row indices are cross-sections. Columns are:

‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation. :type df_xcats: DataFrame :param df_xcats: dataframe with parameters by xcat. Row indices are

cross-sections. Columns are:

‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set autocorrelation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background

factor is added to categoy values.

Parameters:

blackout (dict) – Dictionary defining the blackout periods for each cross- section. The expected form of the dictionary is: {‘AUD’: (Timestamp(‘2000-01-13 00:00:00’), Timestamp(‘2000-01-13 00:00:00’)), ‘USD_1’: (Timestamp(‘2000-01-03 00:00:00’), Timestamp(‘2000-01-05 00:00:00’)), ‘USD_2’: (Timestamp(‘2000-01-09 00:00:00’), Timestamp(‘2000-01-10 00:00:00’)), ‘USD_3’: (Timestamp(‘2000-01-12 00:00:00’), Timestamp(‘2000-01-12 00:00:00’))} The values of the dictionary are tuples consisting of the start & end-date of the respective blackout period. Each cross-section could have potentially more than one blackout period on a single category, and subsequently each key will be indexed to indicate the number of periods.

Return <pd.DataFrame>:

basic quantamental DataFrame according to specifications with binary values.

simulate_ar(nobs, mean=0, sd_mult=1, ar_coef=0.75)[source]#

Create an auto-correlated data-series as numpy array.

Parameters:
  • nobs (int) – number of observations.

  • mean (float) – mean of values, default is zero.

  • sd_mult (float) – standard deviation multipliers of values, default is 1. This affects non-zero means.

  • ar_coef (float) – autoregression coefficient (between 0 and 1): default is 0.75.

Return <np.ndarray>:

autocorrelated data series.

class VintageData(ticker, cutoff='2020-12-31', release_lags=[15, 30], number_firsts=24, shortest=36, freq='M', start_value=100, trend_ar=5, sd_ar=3.4641016151377544, seasonal=None, added_dates=12)[source]#

Bases: object

Creates standardized dataframe of single-ticker vintages. This class creates standardized grade 1 and grade 2 vintage data.

Parameters:
  • ticker – ticker name

  • cutoff – last possible release date. The format must be ‘%Y-%m-%d’. All other dates are calculated from this one. Default is end 2020.

  • release_lags – list of integers in ascending order denoting lags of the first, second etc. release in (calendar) days. Default is first release after 15 days and revision after 30 days. If days fall on weekend they will be delayed to Monday.

  • number_firsts – number of first-release vintages in the simulated data set. Default is 24.

  • shortest – number of observations in the first (shortest) vintage. Default is 36.

  • freq – letter denoting the frequency of the vintage data. Must be one of ‘M’ (monthly, default), ‘Q’ (quarterly) or ‘W’ (weekly).

  • start_value – expected first value of the random series. Default is 100.

  • trend_ar – annualized trend. Default is 5% linear drift per year. This is applied to the start value. If the start value is not positive the linear trend is added as number.

  • sd_ar – annualized standard deviation. Default is sqrt(12).

  • seasonal – adds seasonal pattern (applying linear factor from low to high through the year) with value denoting the average % seasonal factor through the year. Default is None. The seasonal pattern makes only sense for values that are strictly positive and are interpreted as indices.

  • added_dates – number of added first release dates, used for grade 2 dataframe generation. Default is 12.

static date_check(date_string)[source]#

Validates that the dates passed are valid timestamp expressions and will convert to the required form ‘%Y-%m-%d’.

Parameters:

date_string – valid date expression. For instance, “1st January, 2000.”

Raises:
  • <TypeError> – if the date_string is not a string.

  • <ValueError> – if the date_string is not in the correct format.

static week_day(rel_date, day)[source]#
seasonal_adj(obs_dates, seas_factors, values)[source]#

Method used to seasonally adjust the series. Economic data can vary according to the season.

Parameters:
  • obs_dates – observation dates for the series.

  • seas_factors – seasonal factors.

  • values – existing values that have not been seasonally adjusted.

Return <List[float] values:

returns a list of values which have been adjusted seasonally

make_grade1()[source]#
make_graded(grading, upgrades=[])[source]#

Simulates an explicitly graded dataframe with a column ‘grading’.

Parameters:
  • grading – optional addition of grading column. List of grades used from lowest to highest. Default is None. Must be a subset of [3, 2.2, 2.1, 1].

  • upgrades – indices of release dates at which the series upgrade. Must have length of grading minus one. Default is None.

static map_weekday(date)[source]#
make_grade2()[source]#

Method used to construct a dataframe that consists of each respective observation date and the corresponding release date(s) (the release dates are computed using the observation date and the time-period(s) specified in the field “release_lags”).

Return <pd.DataFrame>:

Will return the DataFrame with the additional columns.

add_ticker_parts(df)[source]#

Method used to add the associated tickers.

Parameters:

df – standardised dataframe.

Return <pd.DataFrame>:

Will return the DataFrame with the additional columns.

Submodules#