msecoreml package

Submodules

msecoreml.pddataframeex module

class msecoreml.pddataframeex.PdDataframeEx

Bases: object

A collection of static utility methods that wrap common commands

static add_pct_change_cols(dframe, src_colname, change_colname, startoffset=1, maxoffset=1)

Compute the percentage changes in a column’s value for multiple intervals. Add the changes for each interval as a new column to the data frame. E.g. compute the percent change relative to the last 5 measurements

Parameters:
  • dframe (pd.DataFrame) –
  • src_colname
  • change_colname
  • startoffset
  • maxoffset
static add_series_to_each_col(series, dataframe)

Given a dataframe M of columns [a, b, c, …] and series of summands [s₀, s₁, …, sₙ], return the dataframe consisting of the summand added to each column [a’, b’, c’, …] => a’ = [s₀ + a₀, s₁ + a₁, …, sₙ + aₙ] => b’ = [s₀ + b₀, s₁ + b₁, …, sₙ + bₙ] => c’ = [s₀ + c₀, s₁ + c₁, …, sₙ + cₙ]

static agg_duplicates(data, identifiers, unit_cols, price_cols)
Transforms df in order to eliminate duplicate observations using logic that would correspond with
common pricing problems
Parameters:
  • data – DataFrame of raw data that may have duplicates
  • identifiers – list of columns that identify unique entries in the data set
  • unit_cols – list of DataFrame columns for which duplicates should be summed
  • price_cols – list of DataFrame columns for which duplicates should be resolved by taking a min
static append_cols_inplace(dest_df, source_df)
static append_rows_inplace(dest_df, source_df, simple_index=True)

May change the storage type to floating-point precision Works row-by-row so may be slow

Parameters:
  • dest_df
  • source_df
  • simple_index – We handle two types of indexing. simple_index=T (default) assumes that dest_df has a simple numerically increasing index and we just extend that (so we don’t look at the index values of source_df). simple_index=F looks at source_df and uses those index values to add to dest_df (which could result in overwritting if the index values are duplicated).
static append_series_to_dataframe(series, dataframe)

Create a dataframe by appending the given series as a new column to the end of the dataframe

Parameters:
  • series – The new column to append
  • dataframe – A dataframe with a single column index to which to append the column
Returns:

A single dataframe containing the dataframe with the new column appended

static append_to_col_multiindex(dataframe, new_name, new_values)
static append_to_row_index(df, col_names)
static append_uniform_col_index(dataframe, index_value, index_name)
static assign_inplace(dest_df, source_df)

Hard to modify a dataframe in place (like for modifying func parameters)

Parameters:
  • dest_df – Currently, this needs to be an empty df
  • source_df
static broadcast_scale(dataframe, scales)

Given a dataframe M of columns [c₀, c₁, …, cₘ] and series of scales [s₀, s₁, …, sₙ], return the dataframe consisting of each column scaled by each scale (m * n columns) => [s₀*c₀, s₀*c₁, …, s₀*cₘ, s₁*c₀, s₁*c₁, …, sₙ*cₘ, sₙ*c₀, sₙ*c₁, …, sₙ*cₘ] => [M*s₀, M*s₁, …, M*sₙ]

static cast_col_inplace(dataframe, colname, dtype)

Cast a column in a dataframe to a new type - INPLACE

static concat_along_rows(dataframes, levels=None, name=None)

Create a dataframe combining all the given dataframes

Parameters:
  • dataframes – List of dataframes and series, all with the same number of rows and the same index
  • levels
  • name
Returns:

A single dataframe containing all the given columns

static concat_frames(frames)
Parameters:iterable of DataFrame frames (An) –
Returns:Single concatenated data frame
static drop_col_inplace(dataframe, col)
static drop_dates(data, date_col, ratio_dates_to_drop)

Provides a subsample of the data where whole dates either kept or dropped. Does so in a way that kept dates have equal spacing

Parameters:
  • data
  • date_col
  • ratio_dates_to_drop – For every kept date, how many to drop. 0 means don’t drop any
static drop_index_level_unlabelled(df, levels=0, droplevel=True)

Opposite version of keep_index_level_unlabelled(). See there for help

static drop_rows_index_vals(df, vals, level=None)

Like the opposite of pd.xs (except I can’t yet take in tuples for vals and level

Parameters:
  • df
  • vals – vals or tuple of vals
  • level – level name (or int) or list of levels
Returns:

modified df

static fill_missing(data, panel_cols, date_col, day_interval, fill_nan=None, fill_zero=None)
Fills in missing observations between the first and last available date at the cadence specified.
Newly created values for all columns default to their last available value (unless specified otherwise)
Parameters:
  • data – Dataframe containing raw data
  • panel_cols – list of columns that serve as panel identifiers
  • date_col – Column which indicates the date. Must be poppulated by datetime
  • fill_nan – list of DataFrame columns which are set to nan for all missing obs
  • fill_zero – list of DataFrame columns which are set to zero for all missing obs
  • day_interval – Number of days between consecutive observations
static gen_cartesian_product(series_list, series_names)

Do cartesian product But built-in forgets type so convert back

static gen_df_diff(df, var)
static gen_labels(dframe, cond)

For ML - generates “labels” of 0 and 1. Applies the given condition to each row in the dframe and produce a 0 or 1 label

Parameters:
  • dframe (pd.DataFrame) –
  • cond
static gen_pct_change(dframe, colname, offset=1)

Generate a series representing the percentage changes in a column’s value.

Parameters:
  • dframe (pd.DataFrame) –
  • colname
  • offset (int) – How many intervals back should we compare to?
Returns:

Pct changes for all rows in the dataframe

static get_cell_by_multiindex(dataframe, row_level_value_by_name, col_level_value_by_name)

Get a single column as specified by the given multiindex level values.

Parameters:
  • dataframe (pd.DataFrame) – The dataframe from which to get the value
  • row_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the row multiindex
  • col_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the col multiindex
Returns:

The specified value

Raises:

Exception raised if no such cell can be found or if multiple such cellss are found.

static get_col_by_multiindex(dataframe, level_value_by_name, as_series=True)

Get a single column as specified by the given multiindex level values.

Parameters:
  • dataframe (pd.DataFrame) – The dataframe from which to get the column.
  • level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the multiindex
  • as_series (bool) – If True, return the column as a pd.Series, thereby dropping any column multiindex information. If False, return a pd.Dataframe consisting of a single column.
Returns:

The specified column, as either a pd.Series or pd.DataFrame.

Raises:

Exception raised if no such column can be found or if multiple such columns are found.

static get_col_by_name(dataframe, col_name)
static get_col_from_indicator(df, indicator)
static get_col_index_values(df, index_name)
static get_column_or_index(dataframe, name)
static get_nan_inf_indicator(df)
static get_row_index_values(df, index_name)
static groupbynot(df, not_list)
static impute_panel_column(dataframe, index_name, fill_value=0)

For a given dataframe consisting of one or more panels, fill rows such that for the given row index, each panel has a row for each known level.

For example, in the following table, there is no entry for apples on Tuesday. Therefore, that row is added with the specified fill value (here, 0.0).

NOTE: The same fill value is used across all columns, regardless of column type.

day item price quality
Mon oranges 1.0 ‘good’
Tue oranges 1.1 ‘good’
Mon apples 2.0 ‘okay’
Mon bananas 3.0 ‘okay’
Tue bananas 3.1 ‘good’
day item price quality
Mon oranges 1.0 ‘good’
Tue oranges 1.1 ‘good’
Mon apples 2.0 ‘okay’
Tue apples 0.0 0.0
Mon bananas 3.0 ‘okay’
Tue bananas 3.1 ‘good’
Parameters:
  • dataframe – The dataframe into which to insert rows.
  • index_name – The name of the index level for which to fill in missing items
  • fill_value – The value to use for filling in missing entries
Returns:

A copy of the original dataframe with missing rows inserted.

static interact_dataframes(dataframe_list)
static interact_series_with_dataframe(dataframe, scales, col_name_format=None)

Create a dataframe by multiplying each column in the dataframe by a corresponding series of scale values

Parameters:
  • dataframe – A dataframe with k columns and n rows
  • scales – A 1-dimensional pandas series [s₀, s₁, … sₙ]
  • col_name_format
Returns:

A dataframe where each column is the pairwise product of the original column with the scale values [x₀, x₁, … xₙ] -> [x₀s₀, x₁s₁, …, xₙsₙ]

static keep_index_level_unlabelled(df, levels=0, droplevel=True)

Returns the modified df where rows are dropped where the index levels have values that aren’t in the label list. These end up being printed as nan If you extract the index as idx=mi.remove_levels([<other levels>]) then math.isnan(idx.values)==True for some values

static multiindex_merge_m1_full(dfl, dfr)

Does a m:1 merge from l to r assuming dfr.index.names is a subset of those in dfl. Pandas utils don’t work.

static panel_type(df)
static prepend_col_names(dataframe, prefix)

Prepend each column name in the given dataframe with the specified prefix

Parameters:
  • dataframe – The dataframe in which to prepend the columns
  • prefix – The string to pre-pend to the column names
static prepend_series_to_dataframe(series, dataframe)

Create a dataframe by prepending the given series as a new column at the start of the dataframe

Parameters:
  • series – The new column to append
  • dataframe – A dataframe with a single column index to which to prepend the column
Returns:

A single dataframe containing the dataframe with the new column prepended

static reset_row_index(df, inplace=False)
static select_and_drop(df, colname, val)

Selects on a variable and then drops it. (Like how xs selects on a value of an index-level and then drops the level.)

static set_row_index(df, col_names)
static stack_as_block_diagonal(dataframe, index_name)

For a given dataframe, add/drop columns as-needed so that its column index matches the specified target index. Columns added are filled with zeroes.

Parameters:
  • dataframe – The base dataframe for to pivot to a block diagonal
  • index_name – The name of the index on which to block rows
Returns:

A dataframe where the rows have been pivoted such that each level in the index is a block in a block diagonal.

site 1 2 3
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400

=> (index_name = site)

site 1 2 3
item A B A B A B
1 1 2    
  3 4    
2   10 20  
    30 40  
3     100 200
      300 400

=> (index_name = item)

site 1 2 3 1 2 3
item A A A B B B
A 1 10 100  
  3 30 300  
B   2 20 200
    4 40 400
static str_cat(dataframe, col_names, sep=None)

String concatenate the given columns (names) Returns a new series with the concatenated text

static str_split_paths(dataframe, colname, sep='/')

If Column Data is a ‘path’ into a hierarchy, this method will split the paths and return a data frame with each level in the path represented by its own column

static using_default_index(dataframe)
class msecoreml.pddataframeex.UnitTimeDFType

Bases: enum.Enum

An enumeration.

CROSS_SECTION = 1
PANEL = 3
TIME_SERIES = 2

msecoreml.pdgroupbyex module

class msecoreml.pdgroupbyex.PdGroupByEx

Bases: object

LAG_COL_NAME = '{0} lag{1}'
LAG_DIFF_COL_NAME = '{0} lag_diff{1}'
LEAD_COL_NAME = '{0} lead{1}'
LEAD_DIFF_COL_NAME = '{0} lead_diff{1}'
static exp_moving_avg(groupby, halflife, min_periods)
Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)
Parameters:
  • groupby
  • halflife – The period of time for the exponential weight to reduce to one half
  • min_periods – The minimum number of non-missing values in the window, including current value, required (NaN returned otherwise)
Returns:

A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance

static gen_fore_diffs(groupby, leads)

Create a dataframe of where each column corresponds to the difference of an offset series with the given series

Parameters:
  • groupby
  • leads – The values by which to shift the series
Returns:

A list of #fore columns where column i is the differnece of the left shift series (lag - i) and the given series

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]

static gen_lag_diffs(groupby, lags)

Create a dataframe of where each column corresponds to the difference of a given series and an offset series=

Parameters:
  • groupby
  • lags – The maximum number of values by which to shift the series
Returns:

A list of #lag columns where column i is the difference of the series and a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]

static gen_lags(groupby, lag)

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:
  • groupby
  • lag – The maximum number of values by which to shift the series
Returns:

A list of #lag columns where column i is a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]

static gen_leads(groupby, leads)

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:
  • groupby
  • leads – The numbers of values by which to shift the series
Returns:

A list of #lead columns where column i is a left shift of size (lead - i) of the given series:

if n = 3 and lead = 2 series = [x₀, x₁, x₂] col₀ = [x₁, x₂, NA] col₁ = [x₂, NA, NA] In general for a given lean = l and a series of length n, colₖ = [xₖ₊₁, xₖ₊₂, … xₙ, NA,…, NA]

static get_selection_name(groupby)
static get_shift_name(name_pattern, col_name, shift)

msecoreml.pdmultiindexex module

class msecoreml.pdmultiindexex.PdMultiIndexEx

Bases: object

A collection of utility methods for working with Pandas multi-indexes.

static align_index(multiindex, index_names)

Align given multiindex to specified index

Get an updated version of the given multinindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values and names in the multiindex but not in index_names are dropped

Parameters:
  • multiindex (MultiIndex) – The multinindex to update
  • index_names (list(str)) – An ordered list of index names for the new index
Returns:

An updated version of the given multiindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values.

Return type:

MultiIndex

For the following examples, let mutliindex be the following column multiindex

letter A B A B A A
number 1 1 1 2 2 2
blah 0 0 A A C C
Example:

Input:

  • index_names: [number, missing, blah, letter]

Output:

number 1 1 1 2 2 2
missing None None None None None None
letter A B A B A A
blah 0 0 A A C C
Example:

Input:

  • index_names: [number, missing]

Output:

number 1 1 1 2 2 2
missing None None None None None None
static concat_index_levels(midx, sep='_', filter_blanks=True, name=None)

Takes all the levels of a multiindex, turns to string and then concatenates into plain Index. :param midx: :param str sep: Separator :param filter_blanks: Do we only put sep between non-blank entries? :param name: final name for Index

static fillna_multiindex(idx, value='', level_nums=None)

Indexes sometimes a level l in a multi-index will have has nan’s stored as -1 in the .labels[l] for something not in .levels[l]

.get_level_values(l) #will return things with math.nan
.fillna()/.hasnans #aren't implemented
df.index.set_levels([l.fillna(value) for l in df.index.levels], inplace=True) #Doesn't work
index.get_level_values(l).fillna('') #returns some right
df.index.set_levels([df.index.get_level_values(l).fillna('') for l in range(len(df.index.levels))], 
                    inplace=True) #Doesn't work
static get_1level_block_indicator(multiindex, index_name, index_values)

Get an indicator array indicating the positions in the multiindex with one of the specified values

Parameters:
  • multiindex (MultiIndex) – The multiindex for which to get an indicator
  • index_name (str) – The name of the index for which to build an indicator for the specified values
  • index_values (list(str or int)) – A list of values for which the indicator should be True
Returns:

A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.

Return type:

numpy.array of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site 1 1 2 2 3 Nan
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400
Example:

Input:

  • index_name: site
  • index_values: [1, 2]

Output:

[True, True, True, True, False, False]

Example:

Input:

  • index_name: site
  • index_values: [NaN]
Output:

[False, False, False, False, True, True]

static get_aligned_index(source_index, target_index_names)
static get_levelvalues_by_name(multiindex, index_name)

Return the list of level values for the index with the specified name

Parameters:
  • multiindex – The multi-index containing the specified target index
  • index_name – The name of the index for which to get the level values
Returns:

A list of level values for the index with the specified name

site 1 1 2 2 3 3
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400

index_name = site => [1, 2, 3] index_name = item => [A, B]

static get_nlevel_block_indicator(multiindex, index_value_byname)

Get an indicator array indicating the positions in the multiindex with the specified values

Parameters:
  • multiindex (MultiIndex) – The multiindex for which to get an indicator
  • index_value_byname (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex.
Returns:

A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.

Return type:

numpy.array of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site 1 1 2 NaN 3 NaN
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400
Example:

Input:

  • value_by_name: {site:1, item:B}

Output:

[False, True, False, False, False, False]

Example:

Input:

  • value_by_name: {site:Nan, item:B}

Output:

[False, False, False, True, False, True]

static get_nlevel_singleton_indicator(multiindex, value_by_name)

Get an indicator array indicating the single position in the multiindex with the specified values

Parameters:
  • multiindex (MultiIndex) – The multiindex for which to get an indicator
  • value_by_name (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex
Returns:

A boolean array corresponding to the values of the multiindex with a single True value corresponding to the multiindex location with the specified values.

Return type:

numpy.array of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site 1 1 2 2 3 Nan
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400
Example:

Input:

  • value_by_name: {site:1, item:B}

Output:

[False, True, False, False, False, False]

Example:

Input:

  • value_by_name: {site:Nan, item:B}

Output:

[False, False, False, False, False, True]

static get_some_levels_block_indicator(multiindex, index_value_byname)

Get an indicator array indicating the positions in the multiindex with the specified values

Parameters:
  • multiindex – The multiindex for which to get an indicator
  • index_value_byname – A dictionary containg a mapping of index name to target value for some (not nessecarily all) indices in the multiindex.
Returns:

A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.

site 1 1 2 2 NaN Nan
item A B A B A B
  1 2 10 20 100 200
  3 4 30 40 300 400

{site:1, item:B} => [False, True, False, False, False, False] {site:Nan, item:B} => [False, False, False, False, True, True]

static midx_from_list_of_namedarray(na_list)

Create a MultiIndex from a list of Index’s or pd.Series’s

Parameters:na_list – list of Index’s, or pd.Series (needs .values and .name) for each
static rename_mi_level_names(idx, level, dict_map)
static to_frame(midx, index=True)

msecoreml.pdonehotencoder module

class msecoreml.pdonehotencoder.PdOneHotEncoder(drop_base=True, base_criteria='freq')

Bases: msecoreml.basetransformer.BaseTransformer

Wrapper for pandas.Series One-Hot encoder

The transformed values will adhere to the following:

  • The row index of the returned dataframe will equal that of the given series (the series to be transformed)
  • The column index of the returned dataframe will contain a single index with name equal to the name of the series and the level of each column corresponding the value of the item that column encodes.

Note:

  • The empty string is encoded like any other string.
  • None is treated as missing and is not encoded.
Letter Number ValueToEncode
A 1 “Coke”
A 2 “Pepsi”
B 1 “Tab”
B 2 “Pepsi”
C 1 “”
C 2 None

The following dataframe is returned, where the column multi-index contains a single index ValueToEncode with levels [Coke, Pepsi, Tab]

Value Letter Value Number “Coke” “Pepsi” “Tab” “”
A 1 1 0 0 0
A 2 0 1 0 0
B 1 0 0 1 0
B 2 0 1 0 0
C 1 0 0 0 1
C 2 0 0 0 0
BASE_FREQ = 'freq'
__init__(drop_base=True, base_criteria='freq')
Parameters:
  • drop_base – When returning the transformation, do we include one-hots for all, or all-minus-one The latter is sometime called “dummy variables” in contrast to “normal” one-hot-encoding which does all.
  • base_criteria – How to determine base. Options: freq - String. Modal value. This works better with penalization. <number> - Number as String. Value to be the base.
base_criteria
drop_base
fit(series)

Fit the the encoder to the given series

Parameters:series – The series to which to fit an encoding
fit_transform(series)

Fit an encoder to the given series and return the encoded values of the series

Parameters:series – The series to which to fit an encoding and then transform.
Returns:The dataframe resulting from encoding the given series
labels
transform(series)

Encode the given series using the previously fit encoding

Parameters:series – The series to encode
Returns:The dataframe resulting from encoding the given series

If a new value is encountered in the series (i.e., a value that was not present in the series used to fit the encoder), the resulting encoding for that value will be all zeroes. For example,

Fit:

ValueToEncode ==> ValueToEncode Coke Pepsi
Coke     1 0
Pepsi     0 1
Pepsi     0 1

Transform:

ValueToEncode ==> ValueToEncode Coke Pepsi
Coke     1 0
Tab     0 0

msecoreml.pdseriesex module

class msecoreml.pdseriesex.PdSeriesEx

Bases: object

static avg_across_series(series_collection)

Create a series by taking the average across the given set of series

Parameters:series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]]
Returns:The elementwise avereage of the the series [(x1₀ + x2₀ + .. + xK₀) / K, (x1₁ + x2₁ + .. + xK₁) / K,… ]
static concat_along_rows(series, levels=None, name=None)

Create a dataframe combining all the given series

Parameters:
  • series – List of series
  • levels
  • name
Returns:

A single dataframe containing all the given columns

static exp_moving_avg(series, halflife, min_periods)

Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)

Parameters:
  • series – data
  • halflife – The period of time for the exponential weight to reduce to one half
  • min_periods – The minimum number of non-missing values in the window required (NaN returned otherwise)
Returns:

A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance

static gen_fore_diffs(series, fore)

Create a dataframe of where each column corresponds to the difference of an offset series with the given series

Parameters:
  • series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
  • fore – The maximum number of values by which to shift the series
Returns:

A dataframe with #fore columns where column i is the differnece of the left shift series (lag - i) and the given series

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]

static gen_lag_diffs(series, lag)

Create a dataframe of where each column corresponds to the difference of a given series and an offset series=

Parameters:
  • series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
  • lag – The maximum number of values by which to shift the series
Returns:

A dataframe with #lag columns where column i is the differnece of the series and a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]

static gen_lags(series, lag)

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:
  • series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
  • lag – The maximum number of values by which to shift the series
Returns:

A dataframe with #lag columns where column i is a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]

static gen_leads(series, lead)
static gen_log_series(series)

Create a series by taking the log of each instance in the given series.

Parameters:series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
Returns:A series the log values of the given series [log(x₀), log(x₁), …, log(xₙ)]
static gen_scaled_series(series, scales)

Create a series by multiplying a given series by a corresponding series of scale values

Parameters:
  • series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
  • scales – A 1-dimensional pandas series [s₀, s₁, … sₙ]
Returns:

The pairwise product of the two series [x₀s₀, x₁s₁, …, xₙsₙ]

static get_nan_inf_indicator(series)
static root_meansqr(series)
static sum_across_series(series_collection)

Create a series by taking the average across the given set of series

Parameters:series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]]
Returns:The elementwise sum of the the series [(x1₀ + x2₀ + .. + xK₀), (x1₁ + x2₁ + .. + xK₁),… ]

msecoreml.sample_splitting module

class msecoreml.sample_splitting.SampleSplitting

Bases: object

Classes for dealing with fold info

static filter_folds(row_selector, folds=None)

:param row_selector:1/0 array :param folds:folds

static filter_index_list(row_selector_bool, index_list)
class msecoreml.sample_splitting.SingleFoldFullOverlap

Bases: object

Looks like a KFold-type object, but returns a single fold with all (non-label) indexes in both train, test Used for NO_SPLIT option in DoubleML or DynamicDML

get_n_splits(X)
n_splits
split(X, y=None)

Module contents