macrosynergy.management.simulate.simulate_quantamental_data#
Module with functionality for generating mock quantamental data for testing purposes.
- simulate_ar(nobs, mean=0, sd_mult=1, ar_coef=0.75)[source]#
Create an auto-correlated data-series as numpy array.
- Parameters:
- Return <np.ndarray>:
autocorrelated data series.
- dataframe_generator(df_cids, df_xcats, cid, xcat)[source]#
Adjacent method used to construct the quantamental DataFrame.
- make_qdf(df_cids, df_xcats, back_ar=0)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’.
- Parameters:
df_cids (
DataFrame
) –DataFrame with parameters by cid. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard
deviation.
df_xcats (
DataFrame
) – dataframe with parameters by xcat. Row indices are cross-sections. Columns are: ‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set auto-correlation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background factor is added to category values.back_ar (
float
) – float between 0 and 1 denoting set auto-correlation of the background factor. Default is zero.
- Return <pd.DataFrame>:
basic quantamental DataFrame according to specifications.
- make_qdf_black(df_cids, df_xcats, blackout)[source]#
Make quantamental DataFrame with basic columns: ‘cid’, ‘xcat’, ‘real_date’, ‘value’. In this DataFrame the column, ‘value’, will consist of Binary Values denoting whether the cross-section is active for the corresponding dates.
- Parameters:
df_cids (
DataFrame
) – dataframe with parameters by cid. Row indices are cross-sections. Columns are:
‘earliest’: string of earliest date (ISO) for which country values are available; ‘latest’: string of latest date (ISO) for which country values are available; ‘mean_add’: float of country-specific addition to any category’s mean; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation. :type df_xcats:
DataFrame
:param df_xcats: dataframe with parameters by xcat. Row indices arecross-sections. Columns are:
‘earliest’: string of earliest date (ISO) for which category values are available; ‘latest’: string of latest date (ISO) for which category values are available; ‘mean_add’: float of category-specific addition; ‘sd_mult’: float of country-specific multiplier of an category’s standard deviation; ‘ar_coef’: float between 0 and 1 denoting set autocorrelation of the category; ‘back_coef’: float, coefficient with which communal (mean 0, SD 1) background
factor is added to categoy values.
- Parameters:
blackout (
dict
) – Dictionary defining the blackout periods for each cross- section. The expected form of the dictionary is: {‘AUD’: (Timestamp(‘2000-01-13 00:00:00’), Timestamp(‘2000-01-13 00:00:00’)), ‘USD_1’: (Timestamp(‘2000-01-03 00:00:00’), Timestamp(‘2000-01-05 00:00:00’)), ‘USD_2’: (Timestamp(‘2000-01-09 00:00:00’), Timestamp(‘2000-01-10 00:00:00’)), ‘USD_3’: (Timestamp(‘2000-01-12 00:00:00’), Timestamp(‘2000-01-12 00:00:00’))} The values of the dictionary are tuples consisting of the start & end-date of the respective blackout period. Each cross-section could have potentially more than one blackout period on a single category, and subsequently each key will be indexed to indicate the number of periods.- Return <pd.DataFrame>:
basic quantamental DataFrame according to specifications with binary values.
- generate_lines(sig_len, style='linear')[source]#
Returns a numpy array of a line with a given length.
Parameters :type sig_len:
int
:param sig_len: The number of elements in the returned array. :type style:str
:param style: The style of the line. Default ‘linear’. Current choices are:linear, decreasing-linear, sharp-hill, four-bit-sine, sine, cosine, sawtooth. Adding “inv” or “inverted” to the style will return the inverted version of that line. For example, ‘inv-sawtooth’ or ‘inverted sawtooth’ will return the inverted sawtooth line. ‘any’ will return a random line. ‘all’ will return a list of all the available styles.
- Return <Union[np.ndarray, List[str]]>:
A numpy array of the line. If style is ‘all’, then a list (of strings) of all the available styles is returned.
NOTE: It is indeed request an “inverted linear” or “inverted decreasing-linear” line. They’re just there for completeness and readability.
- make_test_df(cids=['AUD', 'CAD', 'GBP'], xcats=['XR', 'CRY'], start='2010-01-01', end='2020-12-31', style='any')[source]#
Generates a test dataframe with pre-defined values. These values are meant to be used for testing purposes only. The functions generates a standard quantamental dataframe with where the value column is populated with pre-defined values. These values are simple lines, or waves that are easy to identify and differentiate in a plot.
Parameters
- Parameters:
start_date – An ISO-formatted date string.
end_date – An ISO-formatted date string.
style (
str
) – A string that specifies the type of line to generate. Current choices are: ‘linear’, ‘decreasing-linear’, ‘sharp-hill’, ‘four-bit-sine’, ‘sine’, ‘cosine’, ‘sawtooth’, ‘any’. See macrosynergy.management.simulate.simulate_quantamental_data.generate_lines().
- Return type: