macrosynergy.management.simulate.simulate_vintage_data#

Module with functionality for generating mock quantamental data vintages for testing purposes.

class VintageData(ticker, cutoff='2020-12-31', release_lags=[15, 30], number_firsts=24, shortest=36, freq='M', start_value=100, trend_ar=5, sd_ar=3.4641016151377544, seasonal=None, added_dates=12)[source]#

Bases: object

Creates standardized dataframe of single-ticker vintages. This class creates standardized grade 1 and grade 2 vintage data.

Parameters:
  • ticker – ticker name

  • cutoff – last possible release date. The format must be ‘%Y-%m-%d’. All other dates are calculated from this one. Default is end 2020.

  • release_lags – list of integers in ascending order denoting lags of the first, second etc. release in (calendar) days. Default is first release after 15 days and revision after 30 days. If days fall on weekend they will be delayed to Monday.

  • number_firsts – number of first-release vintages in the simulated data set. Default is 24.

  • shortest – number of observations in the first (shortest) vintage. Default is 36.

  • freq – letter denoting the frequency of the vintage data. Must be one of ‘M’ (monthly, default), ‘Q’ (quarterly) or ‘W’ (weekly).

  • start_value – expected first value of the random series. Default is 100.

  • trend_ar – annualized trend. Default is 5% linear drift per year. This is applied to the start value. If the start value is not positive the linear trend is added as number.

  • sd_ar – annualized standard deviation. Default is sqrt(12).

  • seasonal – adds seasonal pattern (applying linear factor from low to high through the year) with value denoting the average % seasonal factor through the year. Default is None. The seasonal pattern makes only sense for values that are strictly positive and are interpreted as indices.

  • added_dates – number of added first release dates, used for grade 2 dataframe generation. Default is 12.

static date_check(date_string)[source]#

Validates that the dates passed are valid timestamp expressions and will convert to the required form ‘%Y-%m-%d’.

Parameters:

date_string – valid date expression. For instance, “1st January, 2000.”

Raises:
  • <TypeError> – if the date_string is not a string.

  • <ValueError> – if the date_string is not in the correct format.

static week_day(rel_date, day)[source]#
seasonal_adj(obs_dates, seas_factors, values)[source]#

Method used to seasonally adjust the series. Economic data can vary according to the season.

Parameters:
  • obs_dates – observation dates for the series.

  • seas_factors – seasonal factors.

  • values – existing values that have not been seasonally adjusted.

Return <List[float] values:

returns a list of values which have been adjusted seasonally

make_grade1()[source]#
make_graded(grading, upgrades=[])[source]#

Simulates an explicitly graded dataframe with a column ‘grading’.

Parameters:
  • grading – optional addition of grading column. List of grades used from lowest to highest. Default is None. Must be a subset of [3, 2.2, 2.1, 1].

  • upgrades – indices of release dates at which the series upgrade. Must have length of grading minus one. Default is None.

static map_weekday(date)[source]#
make_grade2()[source]#

Method used to construct a dataframe that consists of each respective observation date and the corresponding release date(s) (the release dates are computed using the observation date and the time-period(s) specified in the field “release_lags”).

Return <pd.DataFrame>:

Will return the DataFrame with the additional columns.

add_ticker_parts(df)[source]#

Method used to add the associated tickers.

Parameters:

df – standardised dataframe.

Return <pd.DataFrame>:

Will return the DataFrame with the additional columns.