msecoreml package¶

Submodules¶

msecoreml.pddataframeex module¶

class msecoreml.pddataframeex.PdDataframeEx¶

Bases: object

A collection of static utility methods that wrap common commands

static add_pct_change_cols(dframe, src_colname, change_colname, startoffset=1, maxoffset=1)¶

Compute the percentage changes in a column’s value for multiple intervals. Add the changes for each interval as a new column to the data frame. E.g. compute the percent change relative to the last 5 measurements

Parameters:	dframe (pd.DataFrame) – src_colname – change_colname – startoffset – maxoffset –

static add_series_to_each_col(series, dataframe)¶: Given a dataframe M of columns [a, b, c, …] and series of summands [s₀, s₁, …, sₙ], return the dataframe consisting of the summand added to each column [a’, b’, c’, …] => a’ = [s₀ + a₀, s₁ + a₁, …, sₙ + aₙ] => b’ = [s₀ + b₀, s₁ + b₁, …, sₙ + bₙ] => c’ = [s₀ + c₀, s₁ + c₁, …, sₙ + cₙ]

static agg_duplicates(data, identifiers, unit_cols, price_cols)¶

Transforms df in order to eliminate duplicate observations using logic that would correspond with: common pricing problems

Parameters:	data – DataFrame of raw data that may have duplicates identifiers – list of columns that identify unique entries in the data set unit_cols – list of DataFrame columns for which duplicates should be summed price_cols – list of DataFrame columns for which duplicates should be resolved by taking a min

static append_cols_inplace(dest_df, source_df)¶

static append_rows_inplace(dest_df, source_df, simple_index=True)¶

May change the storage type to floating-point precision Works row-by-row so may be slow

Parameters:

dest_df –
source_df –
simple_index – We handle two types of indexing. simple_index=T (default) assumes that dest_df has a simple numerically increasing index and we just extend that (so we don’t look at the index values of source_df). simple_index=F looks at source_df and uses those index values to add to dest_df (which could result in overwritting if the index values are duplicated).

static append_series_to_dataframe(series, dataframe)¶

Create a dataframe by appending the given series as a new column to the end of the dataframe

Parameters:	series – The new column to append dataframe – A dataframe with a single column index to which to append the column
Returns:	A single dataframe containing the dataframe with the new column appended

static append_to_col_multiindex(dataframe, new_name, new_values)¶

static append_to_row_index(df, col_names)¶

static append_uniform_col_index(dataframe, index_value, index_name)¶

static assign_inplace(dest_df, source_df)¶

Hard to modify a dataframe in place (like for modifying func parameters)

Parameters:	dest_df – Currently, this needs to be an empty df source_df –

static broadcast_scale(dataframe, scales)¶: Given a dataframe M of columns [c₀, c₁, …, cₘ] and series of scales [s₀, s₁, …, sₙ], return the dataframe consisting of each column scaled by each scale (m * n columns) => [s₀*c₀, s₀*c₁, …, s₀*cₘ, s₁*c₀, s₁*c₁, …, sₙ*cₘ, sₙ*c₀, sₙ*c₁, …, sₙ*cₘ] => [M*s₀, M*s₁, …, M*sₙ]

static cast_col_inplace(dataframe, colname, dtype)¶: Cast a column in a dataframe to a new type - INPLACE

static concat_along_rows(dataframes, levels=None, name=None)¶

Create a dataframe combining all the given dataframes

Parameters:	dataframes – List of dataframes and series, all with the same number of rows and the same index levels – name –
Returns:	A single dataframe containing all the given columns

static concat_frames(frames)¶

Parameters:	iterable of DataFrame frames (An) –
Returns:	Single concatenated data frame

static drop_col_inplace(dataframe, col)¶

static drop_dates(data, date_col, ratio_dates_to_drop)¶

Provides a subsample of the data where whole dates either kept or dropped. Does so in a way that kept dates have equal spacing

Parameters:	data – date_col – ratio_dates_to_drop – For every kept date, how many to drop. 0 means don’t drop any

static drop_index_level_unlabelled(df, levels=0, droplevel=True)¶: Opposite version of keep_index_level_unlabelled(). See there for help

static drop_rows_index_vals(df, vals, level=None)¶

Like the opposite of pd.xs (except I can’t yet take in tuples for vals and level

Parameters:	df – vals – vals or tuple of vals level – level name (or int) or list of levels
Returns:	modified df

static fill_missing(data, panel_cols, date_col, day_interval, fill_nan=None, fill_zero=None)¶

Fills in missing observations between the first and last available date at the cadence specified.: Newly created values for all columns default to their last available value (unless specified otherwise)

Parameters:

data – Dataframe containing raw data
panel_cols – list of columns that serve as panel identifiers
date_col – Column which indicates the date. Must be poppulated by datetime
fill_nan – list of DataFrame columns which are set to nan for all missing obs
fill_zero – list of DataFrame columns which are set to zero for all missing obs
day_interval – Number of days between consecutive observations

static gen_cartesian_product(series_list, series_names)¶: Do cartesian product But built-in forgets type so convert back

static gen_df_diff(df, var)¶

static gen_labels(dframe, cond)¶

For ML - generates “labels” of 0 and 1. Applies the given condition to each row in the dframe and produce a 0 or 1 label

Parameters:	dframe (pd.DataFrame) – cond –

static gen_pct_change(dframe, colname, offset=1)¶

Generate a series representing the percentage changes in a column’s value.

Parameters:	dframe (pd.DataFrame) – colname – offset (int) – How many intervals back should we compare to?
Returns:	Pct changes for all rows in the dataframe

static get_cell_by_multiindex(dataframe, row_level_value_by_name, col_level_value_by_name)¶

Get a single column as specified by the given multiindex level values.

Parameters:	dataframe (pd.DataFrame) – The dataframe from which to get the value row_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the row multiindex col_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the col multiindex
Returns:	The specified value
Raises:	Exception raised if no such cell can be found or if multiple such cellss are found.

static get_col_by_multiindex(dataframe, level_value_by_name, as_series=True)¶

Get a single column as specified by the given multiindex level values.

Parameters:	dataframe (pd.DataFrame) – The dataframe from which to get the column. level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the multiindex as_series (bool) – If True, return the column as a pd.Series, thereby dropping any column multiindex information. If False, return a pd.Dataframe consisting of a single column.
Returns:	The specified column, as either a pd.Series or pd.DataFrame.
Raises:	Exception raised if no such column can be found or if multiple such columns are found.

static get_col_by_name(dataframe, col_name)¶

static get_col_from_indicator(df, indicator)¶

static get_col_index_values(df, index_name)¶

static get_column_or_index(dataframe, name)¶

static get_nan_inf_indicator(df)¶

static get_row_index_values(df, index_name)¶

static groupbynot(df, not_list)¶

static impute_panel_column(dataframe, index_name, fill_value=0)¶

For a given dataframe consisting of one or more panels, fill rows such that for the given row index, each panel has a row for each known level.

For example, in the following table, there is no entry for apples on Tuesday. Therefore, that row is added with the specified fill value (here, 0.0).

NOTE: The same fill value is used across all columns, regardless of column type.

day	item	price	quality
Mon	oranges	1.0	‘good’
Tue	oranges	1.1	‘good’
Mon	apples	2.0	‘okay’
Mon	bananas	3.0	‘okay’
Tue	bananas	3.1	‘good’

day	item	price	quality
Mon	oranges	1.0	‘good’
Tue	oranges	1.1	‘good’
Mon	apples	2.0	‘okay’
Tue	apples	0.0	0.0
Mon	bananas	3.0	‘okay’
Tue	bananas	3.1	‘good’

Parameters:	dataframe – The dataframe into which to insert rows. index_name – The name of the index level for which to fill in missing items fill_value – The value to use for filling in missing entries
Returns:	A copy of the original dataframe with missing rows inserted.

static interact_dataframes(dataframe_list)¶

static interact_series_with_dataframe(dataframe, scales, col_name_format=None)¶

Create a dataframe by multiplying each column in the dataframe by a corresponding series of scale values

Parameters:	dataframe – A dataframe with k columns and n rows scales – A 1-dimensional pandas series [s₀, s₁, … sₙ] col_name_format –
Returns:	A dataframe where each column is the pairwise product of the original column with the scale values [x₀, x₁, … xₙ] -> [x₀s₀, x₁s₁, …, xₙsₙ]

static keep_index_level_unlabelled(df, levels=0, droplevel=True)¶: Returns the modified df where rows are dropped where the index levels have values that aren’t in the label list. These end up being printed as nan If you extract the index as idx=mi.remove_levels([<other levels>]) then math.isnan(idx.values)==True for some values

static multiindex_merge_m1_full(dfl, dfr)¶: Does a m:1 merge from l to r assuming dfr.index.names is a subset of those in dfl. Pandas utils don’t work.

static panel_type(df)¶

static prepend_col_names(dataframe, prefix)¶

Prepend each column name in the given dataframe with the specified prefix

Parameters:	dataframe – The dataframe in which to prepend the columns prefix – The string to pre-pend to the column names

static prepend_series_to_dataframe(series, dataframe)¶

Create a dataframe by prepending the given series as a new column at the start of the dataframe

Parameters:	series – The new column to append dataframe – A dataframe with a single column index to which to prepend the column
Returns:	A single dataframe containing the dataframe with the new column prepended

static reset_row_index(df, inplace=False)¶

static select_and_drop(df, colname, val)¶: Selects on a variable and then drops it. (Like how xs selects on a value of an index-level and then drops the level.)

static set_row_index(df, col_names)¶

static stack_as_block_diagonal(dataframe, index_name)¶

For a given dataframe, add/drop columns as-needed so that its column index matches the specified target index. Columns added are filled with zeroes.

Parameters:	dataframe – The base dataframe for to pivot to a block diagonal index_name – The name of the index on which to block rows
Returns:	A dataframe where the rows have been pivoted such that each level in the index is a block in a block diagonal.

site	1 2 3
item	A B A B A B
	1 2 10 20 100 200
	3 4 30 40 300 400

=> (index_name = site)

site	1	2	3
item	A B	A B	A B
1	1 2
	3 4
2		10 20
		30 40
3			100 200
			300 400

=> (index_name = item)

site	1 2 3	1 2 3
item	A A A	B B B
A	1 10 100
	3 30 300
B		2 20 200
		4 40 400

static str_cat(dataframe, col_names, sep=None)¶: String concatenate the given columns (names) Returns a new series with the concatenated text

static str_split_paths(dataframe, colname, sep='/')¶: If Column Data is a ‘path’ into a hierarchy, this method will split the paths and return a data frame with each level in the path represented by its own column

static using_default_index(dataframe)¶

class msecoreml.pddataframeex.UnitTimeDFType¶

Bases: enum.Enum

An enumeration.

CROSS_SECTION = 1¶

PANEL = 3¶

TIME_SERIES = 2¶

msecoreml.pdgroupbyex module¶

class msecoreml.pdgroupbyex.PdGroupByEx¶

Bases: object

LAG_COL_NAME = '{0} lag{1}'¶

LAG_DIFF_COL_NAME = '{0} lag_diff{1}'¶

LEAD_COL_NAME = '{0} lead{1}'¶

LEAD_DIFF_COL_NAME = '{0} lead_diff{1}'¶

static exp_moving_avg(groupby, halflife, min_periods)¶

Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)

Parameters:	groupby – halflife – The period of time for the exponential weight to reduce to one half min_periods – The minimum number of non-missing values in the window, including current value, required (NaN returned otherwise)
Returns:	A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance

static gen_fore_diffs(groupby, leads)¶

Create a dataframe of where each column corresponds to the difference of an offset series with the given series

Parameters:	groupby – leads – The values by which to shift the series
Returns:	A list of #fore columns where column i is the differnece of the left shift series (lag - i) and the given series

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]

static gen_lag_diffs(groupby, lags)¶

Create a dataframe of where each column corresponds to the difference of a given series and an offset series=

Parameters:	groupby – lags – The maximum number of values by which to shift the series
Returns:	A list of #lag columns where column i is the difference of the series and a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]

static gen_lags(groupby, lag)¶

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:	groupby – lag – The maximum number of values by which to shift the series
Returns:	A list of #lag columns where column i is a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]

static gen_leads(groupby, leads)¶

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:	groupby – leads – The numbers of values by which to shift the series
Returns:	A list of #lead columns where column i is a left shift of size (lead - i) of the given series:

if n = 3 and lead = 2 series = [x₀, x₁, x₂] col₀ = [x₁, x₂, NA] col₁ = [x₂, NA, NA] In general for a given lean = l and a series of length n, colₖ = [xₖ₊₁, xₖ₊₂, … xₙ, NA,…, NA]

static get_selection_name(groupby)¶

static get_shift_name(name_pattern, col_name, shift)¶

msecoreml.pdmultiindexex module¶

class msecoreml.pdmultiindexex.PdMultiIndexEx¶

Bases: object

A collection of utility methods for working with Pandas multi-indexes.

static align_index(multiindex, index_names)¶

Align given multiindex to specified index

Get an updated version of the given multinindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values and names in the multiindex but not in index_names are dropped

Parameters:	multiindex (`MultiIndex`) – The multinindex to update index_names (list(str)) – An ordered list of index names for the new index
Returns:	An updated version of the given multiindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values.
Return type:	`MultiIndex`

For the following examples, let mutliindex be the following column multiindex

letter	A	B	A	B	A	A
number	1	1	1	2	2	2
blah	0	0	A	A	C	C

Example:

Input:

index_names: [number, missing, blah, letter]

Output:

number 1 1 1 2 2 2

missing None None None None None None

letter A B A B A A

blah 0 0 A A C C

Example:

Input:

index_names: [number, missing]

Output:

number 1 1 1 2 2 2

missing None None None None None None

static concat_index_levels(midx, sep='_', filter_blanks=True, name=None)¶: Takes all the levels of a multiindex, turns to string and then concatenates into plain Index. :param midx: :param str sep: Separator :param filter_blanks: Do we only put sep between non-blank entries? :param name: final name for Index

static fillna_multiindex(idx, value='', level_nums=None)¶

Indexes sometimes a level l in a multi-index will have has nan’s stored as -1 in the .labels[l] for something not in .levels[l]

.get_level_values(l) #will return things with math.nan
.fillna()/.hasnans #aren't implemented
df.index.set_levels([l.fillna(value) for l in df.index.levels], inplace=True) #Doesn't work
index.get_level_values(l).fillna('') #returns some right
df.index.set_levels([df.index.get_level_values(l).fillna('') for l in range(len(df.index.levels))], 
                    inplace=True) #Doesn't work

static get_1level_block_indicator(multiindex, index_name, index_values)¶

Get an indicator array indicating the positions in the multiindex with one of the specified values

Parameters:	multiindex (`MultiIndex`) – The multiindex for which to get an indicator index_name (str) – The name of the index for which to build an indicator for the specified values index_values (list(str or int)) – A list of values for which the indicator should be True
Returns:	A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.
Return type:	`numpy.array` of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site	1	1	2	2	3	Nan
item	A	B	A	B	A	B
	1	2	10	20	100	200
	3	4	30	40	300	400

Example:

Input:

index_name: site

index_values: [1, 2]

Output:

[True, True, True, True, False, False]

Example:

Input:

index_name: site

index_values: [NaN]

Output:: [False, False, False, False, True, True]

static get_aligned_index(source_index, target_index_names)¶

static get_levelvalues_by_name(multiindex, index_name)¶

Return the list of level values for the index with the specified name

Parameters:	multiindex – The multi-index containing the specified target index index_name – The name of the index for which to get the level values
Returns:	A list of level values for the index with the specified name

site	1 1 2 2 3 3
item	A B A B A B
	1 2 10 20 100 200
	3 4 30 40 300 400

index_name = site => [1, 2, 3] index_name = item => [A, B]

static get_nlevel_block_indicator(multiindex, index_value_byname)¶

Get an indicator array indicating the positions in the multiindex with the specified values

Parameters:	multiindex (`MultiIndex`) – The multiindex for which to get an indicator index_value_byname (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex.
Returns:	A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.
Return type:	`numpy.array` of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site	1	1	2	NaN	3	NaN
item	A	B	A	B	A	B
	1	2	10	20	100	200
	3	4	30	40	300	400

Example:

Input:

value_by_name: {site:1, item:B}

Output:

[False, True, False, False, False, False]

Example:

Input:

value_by_name: {site:Nan, item:B}

Output:

[False, False, False, True, False, True]

static get_nlevel_singleton_indicator(multiindex, value_by_name)¶

Get an indicator array indicating the single position in the multiindex with the specified values

Parameters:	multiindex (`MultiIndex`) – The multiindex for which to get an indicator value_by_name (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex
Returns:	A boolean array corresponding to the values of the multiindex with a single True value corresponding to the multiindex location with the specified values.
Return type:	`numpy.array` of bool

For the following examples, let the mutliindex be the column index of the following dataframe

site	1	1	2	2	3	Nan
item	A	B	A	B	A	B
	1	2	10	20	100	200
	3	4	30	40	300	400

Example:

Input:

value_by_name: {site:1, item:B}

Output:

[False, True, False, False, False, False]

Example:

Input:

value_by_name: {site:Nan, item:B}

Output:

[False, False, False, False, False, True]

static get_some_levels_block_indicator(multiindex, index_value_byname)¶

Get an indicator array indicating the positions in the multiindex with the specified values

Parameters:	multiindex – The multiindex for which to get an indicator index_value_byname – A dictionary containg a mapping of index name to target value for some (not nessecarily all) indices in the multiindex.
Returns:	A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.

site	1 1 2 2 NaN Nan
item	A B A B A B
	1 2 10 20 100 200
	3 4 30 40 300 400

{site:1, item:B} => [False, True, False, False, False, False] {site:Nan, item:B} => [False, False, False, False, True, True]

static midx_from_list_of_namedarray(na_list)¶

Create a MultiIndex from a list of Index’s or pd.Series’s

Parameters:	na_list – list of Index’s, or pd.Series (needs .values and .name) for each

static rename_mi_level_names(idx, level, dict_map)¶

static to_frame(midx, index=True)¶

msecoreml.pdonehotencoder module¶

class msecoreml.pdonehotencoder.PdOneHotEncoder(drop_base=True, base_criteria='freq')¶

Bases: msecoreml.basetransformer.BaseTransformer

Wrapper for pandas.Series One-Hot encoder

The transformed values will adhere to the following:

The row index of the returned dataframe will equal that of the given series (the series to be transformed)

The column index of the returned dataframe will contain a single index with name equal to the name of the series and the level of each column corresponding the value of the item that column encodes.

Note:

The empty string is encoded like any other string.

None is treated as missing and is not encoded.

Letter Number ValueToEncode

A 1 “Coke”

A 2 “Pepsi”

B 1 “Tab”

B 2 “Pepsi”

C 1 “”

C 2 None

The following dataframe is returned, where the column multi-index contains a single index ValueToEncode with levels [Coke, Pepsi, Tab]

Value Letter Value Number “Coke” “Pepsi” “Tab” “”

A 1 1 0 0 0

A 2 0 1 0 0

B 1 0 0 1 0

B 2 0 1 0 0

C 1 0 0 0 1

C 2 0 0 0 0

BASE_FREQ = 'freq'¶

__init__(drop_base=True, base_criteria='freq')¶

Parameters:	drop_base – When returning the transformation, do we include one-hots for all, or all-minus-one The latter is sometime called “dummy variables” in contrast to “normal” one-hot-encoding which does all. base_criteria – How to determine base. Options: freq - String. Modal value. This works better with penalization. <number> - Number as String. Value to be the base.

base_criteria¶

drop_base¶

fit(series)¶

Fit the the encoder to the given series

Parameters:	series – The series to which to fit an encoding

fit_transform(series)¶

Fit an encoder to the given series and return the encoded values of the series

Parameters:	series – The series to which to fit an encoding and then transform.
Returns:	The dataframe resulting from encoding the given series

labels¶

transform(series)¶

Encode the given series using the previously fit encoding

Parameters:	series – The series to encode
Returns:	The dataframe resulting from encoding the given series

If a new value is encountered in the series (i.e., a value that was not present in the series used to fit the encoder), the resulting encoding for that value will be all zeroes. For example,

Fit:

ValueToEncode	Coke	Pepsi
Coke	1	0
Pepsi	0	1
Pepsi	0	1

Transform:

ValueToEncode	==>	ValueToEncode	Coke	Pepsi
Coke			1	0
Tab			0	0

msecoreml.pdseriesex module¶

class msecoreml.pdseriesex.PdSeriesEx¶

Bases: object

static avg_across_series(series_collection)¶

Create a series by taking the average across the given set of series

Parameters:	series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]]
Returns:	The elementwise avereage of the the series [(x1₀ + x2₀ + .. + xK₀) / K, (x1₁ + x2₁ + .. + xK₁) / K,… ]

static concat_along_rows(series, levels=None, name=None)¶

Create a dataframe combining all the given series

Parameters:	series – List of series levels – name –
Returns:	A single dataframe containing all the given columns

static exp_moving_avg(series, halflife, min_periods)¶

Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)

Parameters:	series – data halflife – The period of time for the exponential weight to reduce to one half min_periods – The minimum number of non-missing values in the window required (NaN returned otherwise)
Returns:	A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance

static gen_fore_diffs(series, fore)¶

Create a dataframe of where each column corresponds to the difference of an offset series with the given series

Parameters:	series – A 1-dimensional pandas series [x₀, x₁, … xₙ] fore – The maximum number of values by which to shift the series
Returns:	A dataframe with #fore columns where column i is the differnece of the left shift series (lag - i) and the given series

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]

static gen_lag_diffs(series, lag)¶

Create a dataframe of where each column corresponds to the difference of a given series and an offset series=

Parameters:	series – A 1-dimensional pandas series [x₀, x₁, … xₙ] lag – The maximum number of values by which to shift the series
Returns:	A dataframe with #lag columns where column i is the differnece of the series and a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]

static gen_lags(series, lag)¶

Create a dataframe of where each column corresponds to an offset series of the given series

Parameters:	series – A 1-dimensional pandas series [x₀, x₁, … xₙ] lag – The maximum number of values by which to shift the series
Returns:	A dataframe with #lag columns where column i is a right shift of size (lag - i) of the given series:

if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]

static gen_leads(series, lead)¶

static gen_log_series(series)¶

Create a series by taking the log of each instance in the given series.

Parameters:	series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
Returns:	A series the log values of the given series [log(x₀), log(x₁), …, log(xₙ)]

static gen_scaled_series(series, scales)¶

Create a series by multiplying a given series by a corresponding series of scale values

Parameters:	series – A 1-dimensional pandas series [x₀, x₁, … xₙ] scales – A 1-dimensional pandas series [s₀, s₁, … sₙ]
Returns:	The pairwise product of the two series [x₀s₀, x₁s₁, …, xₙsₙ]

static get_nan_inf_indicator(series)¶

static root_meansqr(series)¶

static sum_across_series(series_collection)¶

Create a series by taking the average across the given set of series

Parameters:	series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]]
Returns:	The elementwise sum of the the series [(x1₀ + x2₀ + .. + xK₀), (x1₁ + x2₁ + .. + xK₁),… ]

msecoreml.sample_splitting module¶

class msecoreml.sample_splitting.SampleSplitting¶

Bases: object

Classes for dealing with fold info

static filter_folds(row_selector, folds=None)¶: :param row_selector:1/0 array :param folds:folds

static filter_index_list(row_selector_bool, index_list)¶

class msecoreml.sample_splitting.SingleFoldFullOverlap¶

Bases: object

Looks like a KFold-type object, but returns a single fold with all (non-label) indexes in both train, test Used for NO_SPLIT option in DoubleML or DynamicDML

get_n_splits(X)¶

n_splits¶

split(X, y=None)¶

msecoreml package¶

Submodules¶

msecoreml.pddataframeex module¶

msecoreml.pdgroupbyex module¶

msecoreml.pdmultiindexex module¶

msecoreml.pdonehotencoder module¶

msecoreml.pdseriesex module¶

msecoreml.sample_splitting module¶

Module contents¶

Pricing Engine

Navigation

Related Topics

Letter	Number	ValueToEncode
A	1	“Coke”
A	2	“Pepsi”
B	1	“Tab”
B	2	“Pepsi”
C	1	“”
C	2	None