msecoreml package¶
Submodules¶
msecoreml.pddataframeex module¶
-
class
msecoreml.pddataframeex.
PdDataframeEx
¶ Bases:
object
A collection of static utility methods that wrap common commands
-
static
add_pct_change_cols
(dframe, src_colname, change_colname, startoffset=1, maxoffset=1)¶ Compute the percentage changes in a column’s value for multiple intervals. Add the changes for each interval as a new column to the data frame. E.g. compute the percent change relative to the last 5 measurements
Parameters: - dframe (pd.DataFrame) –
- src_colname –
- change_colname –
- startoffset –
- maxoffset –
-
static
add_series_to_each_col
(series, dataframe)¶ Given a dataframe M of columns [a, b, c, …] and series of summands [s₀, s₁, …, sₙ], return the dataframe consisting of the summand added to each column [a’, b’, c’, …] => a’ = [s₀ + a₀, s₁ + a₁, …, sₙ + aₙ] => b’ = [s₀ + b₀, s₁ + b₁, …, sₙ + bₙ] => c’ = [s₀ + c₀, s₁ + c₁, …, sₙ + cₙ]
-
static
agg_duplicates
(data, identifiers, unit_cols, price_cols)¶ - Transforms df in order to eliminate duplicate observations using logic that would correspond with
- common pricing problems
Parameters: - data – DataFrame of raw data that may have duplicates
- identifiers – list of columns that identify unique entries in the data set
- unit_cols – list of DataFrame columns for which duplicates should be summed
- price_cols – list of DataFrame columns for which duplicates should be resolved by taking a min
-
static
append_cols_inplace
(dest_df, source_df)¶
-
static
append_rows_inplace
(dest_df, source_df, simple_index=True)¶ May change the storage type to floating-point precision Works row-by-row so may be slow
Parameters: - dest_df –
- source_df –
- simple_index – We handle two types of indexing. simple_index=T (default) assumes that dest_df has a simple numerically increasing index and we just extend that (so we don’t look at the index values of source_df). simple_index=F looks at source_df and uses those index values to add to dest_df (which could result in overwritting if the index values are duplicated).
-
static
append_series_to_dataframe
(series, dataframe)¶ Create a dataframe by appending the given series as a new column to the end of the dataframe
Parameters: - series – The new column to append
- dataframe – A dataframe with a single column index to which to append the column
Returns: A single dataframe containing the dataframe with the new column appended
-
static
append_to_col_multiindex
(dataframe, new_name, new_values)¶
-
static
append_to_row_index
(df, col_names)¶
-
static
append_uniform_col_index
(dataframe, index_value, index_name)¶
-
static
assign_inplace
(dest_df, source_df)¶ Hard to modify a dataframe in place (like for modifying func parameters)
Parameters: - dest_df – Currently, this needs to be an empty df
- source_df –
-
static
broadcast_scale
(dataframe, scales)¶ Given a dataframe M of columns [c₀, c₁, …, cₘ] and series of scales [s₀, s₁, …, sₙ], return the dataframe consisting of each column scaled by each scale (m * n columns) => [s₀*c₀, s₀*c₁, …, s₀*cₘ, s₁*c₀, s₁*c₁, …, sₙ*cₘ, sₙ*c₀, sₙ*c₁, …, sₙ*cₘ] => [M*s₀, M*s₁, …, M*sₙ]
-
static
cast_col_inplace
(dataframe, colname, dtype)¶ Cast a column in a dataframe to a new type - INPLACE
-
static
concat_along_rows
(dataframes, levels=None, name=None)¶ Create a dataframe combining all the given dataframes
Parameters: - dataframes – List of dataframes and series, all with the same number of rows and the same index
- levels –
- name –
Returns: A single dataframe containing all the given columns
-
static
concat_frames
(frames)¶ Parameters: iterable of DataFrame frames (An) – Returns: Single concatenated data frame
-
static
drop_col_inplace
(dataframe, col)¶
-
static
drop_dates
(data, date_col, ratio_dates_to_drop)¶ Provides a subsample of the data where whole dates either kept or dropped. Does so in a way that kept dates have equal spacing
Parameters: - data –
- date_col –
- ratio_dates_to_drop – For every kept date, how many to drop. 0 means don’t drop any
-
static
drop_index_level_unlabelled
(df, levels=0, droplevel=True)¶ Opposite version of keep_index_level_unlabelled(). See there for help
-
static
drop_rows_index_vals
(df, vals, level=None)¶ Like the opposite of pd.xs (except I can’t yet take in tuples for vals and level
Parameters: - df –
- vals – vals or tuple of vals
- level – level name (or int) or list of levels
Returns: modified df
-
static
fill_missing
(data, panel_cols, date_col, day_interval, fill_nan=None, fill_zero=None)¶ - Fills in missing observations between the first and last available date at the cadence specified.
- Newly created values for all columns default to their last available value (unless specified otherwise)
Parameters: - data – Dataframe containing raw data
- panel_cols – list of columns that serve as panel identifiers
- date_col – Column which indicates the date. Must be poppulated by datetime
- fill_nan – list of DataFrame columns which are set to nan for all missing obs
- fill_zero – list of DataFrame columns which are set to zero for all missing obs
- day_interval – Number of days between consecutive observations
-
static
gen_cartesian_product
(series_list, series_names)¶ Do cartesian product But built-in forgets type so convert back
-
static
gen_df_diff
(df, var)¶
-
static
gen_labels
(dframe, cond)¶ For ML - generates “labels” of 0 and 1. Applies the given condition to each row in the dframe and produce a 0 or 1 label
Parameters: - dframe (pd.DataFrame) –
- cond –
-
static
gen_pct_change
(dframe, colname, offset=1)¶ Generate a series representing the percentage changes in a column’s value.
Parameters: - dframe (pd.DataFrame) –
- colname –
- offset (int) – How many intervals back should we compare to?
Returns: Pct changes for all rows in the dataframe
-
static
get_cell_by_multiindex
(dataframe, row_level_value_by_name, col_level_value_by_name)¶ Get a single column as specified by the given multiindex level values.
Parameters: - dataframe (pd.DataFrame) – The dataframe from which to get the value
- row_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the row multiindex
- col_level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the col multiindex
Returns: The specified value
Raises: Exception raised if no such cell can be found or if multiple such cellss are found.
-
static
get_col_by_multiindex
(dataframe, level_value_by_name, as_series=True)¶ Get a single column as specified by the given multiindex level values.
Parameters: - dataframe (pd.DataFrame) – The dataframe from which to get the column.
- level_value_by_name (dictionary) – dictionary mapping index name to level value, specifying a level value for each index in the multiindex
- as_series (bool) – If True, return the column as a pd.Series, thereby dropping any column multiindex information. If False, return a pd.Dataframe consisting of a single column.
Returns: The specified column, as either a pd.Series or pd.DataFrame.
Raises: Exception raised if no such column can be found or if multiple such columns are found.
-
static
get_col_by_name
(dataframe, col_name)¶
-
static
get_col_from_indicator
(df, indicator)¶
-
static
get_col_index_values
(df, index_name)¶
-
static
get_column_or_index
(dataframe, name)¶
-
static
get_nan_inf_indicator
(df)¶
-
static
get_row_index_values
(df, index_name)¶
-
static
groupbynot
(df, not_list)¶
-
static
impute_panel_column
(dataframe, index_name, fill_value=0)¶ For a given dataframe consisting of one or more panels, fill rows such that for the given row index, each panel has a row for each known level.
For example, in the following table, there is no entry for apples on Tuesday. Therefore, that row is added with the specified fill value (here, 0.0).
NOTE: The same fill value is used across all columns, regardless of column type.
day item price quality Mon oranges 1.0 ‘good’ Tue oranges 1.1 ‘good’ Mon apples 2.0 ‘okay’ Mon bananas 3.0 ‘okay’ Tue bananas 3.1 ‘good’ day item price quality Mon oranges 1.0 ‘good’ Tue oranges 1.1 ‘good’ Mon apples 2.0 ‘okay’ Tue apples 0.0 0.0 Mon bananas 3.0 ‘okay’ Tue bananas 3.1 ‘good’ Parameters: - dataframe – The dataframe into which to insert rows.
- index_name – The name of the index level for which to fill in missing items
- fill_value – The value to use for filling in missing entries
Returns: A copy of the original dataframe with missing rows inserted.
-
static
interact_dataframes
(dataframe_list)¶
-
static
interact_series_with_dataframe
(dataframe, scales, col_name_format=None)¶ Create a dataframe by multiplying each column in the dataframe by a corresponding series of scale values
Parameters: - dataframe – A dataframe with k columns and n rows
- scales – A 1-dimensional pandas series [s₀, s₁, … sₙ]
- col_name_format –
Returns: A dataframe where each column is the pairwise product of the original column with the scale values [x₀, x₁, … xₙ] -> [x₀s₀, x₁s₁, …, xₙsₙ]
-
static
keep_index_level_unlabelled
(df, levels=0, droplevel=True)¶ Returns the modified df where rows are dropped where the index levels have values that aren’t in the label list. These end up being printed as nan If you extract the index as idx=mi.remove_levels([<other levels>]) then math.isnan(idx.values)==True for some values
-
static
multiindex_merge_m1_full
(dfl, dfr)¶ Does a m:1 merge from l to r assuming dfr.index.names is a subset of those in dfl. Pandas utils don’t work.
-
static
panel_type
(df)¶
-
static
prepend_col_names
(dataframe, prefix)¶ Prepend each column name in the given dataframe with the specified prefix
Parameters: - dataframe – The dataframe in which to prepend the columns
- prefix – The string to pre-pend to the column names
-
static
prepend_series_to_dataframe
(series, dataframe)¶ Create a dataframe by prepending the given series as a new column at the start of the dataframe
Parameters: - series – The new column to append
- dataframe – A dataframe with a single column index to which to prepend the column
Returns: A single dataframe containing the dataframe with the new column prepended
-
static
reset_row_index
(df, inplace=False)¶
-
static
select_and_drop
(df, colname, val)¶ Selects on a variable and then drops it. (Like how xs selects on a value of an index-level and then drops the level.)
-
static
set_row_index
(df, col_names)¶
-
static
stack_as_block_diagonal
(dataframe, index_name)¶ For a given dataframe, add/drop columns as-needed so that its column index matches the specified target index. Columns added are filled with zeroes.
Parameters: - dataframe – The base dataframe for to pivot to a block diagonal
- index_name – The name of the index on which to block rows
Returns: A dataframe where the rows have been pivoted such that each level in the index is a block in a block diagonal.
site 1 2 3 item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 => (index_name = site)
site 1 2 3 item A B A B A B 1 1 2 3 4 2 10 20 30 40 3 100 200 300 400 => (index_name = item)
site 1 2 3 1 2 3 item A A A B B B A 1 10 100 3 30 300 B 2 20 200 4 40 400
-
static
str_cat
(dataframe, col_names, sep=None)¶ String concatenate the given columns (names) Returns a new series with the concatenated text
-
static
str_split_paths
(dataframe, colname, sep='/')¶ If Column Data is a ‘path’ into a hierarchy, this method will split the paths and return a data frame with each level in the path represented by its own column
-
static
using_default_index
(dataframe)¶
-
static
msecoreml.pdgroupbyex module¶
-
class
msecoreml.pdgroupbyex.
PdGroupByEx
¶ Bases:
object
-
LAG_COL_NAME
= '{0} lag{1}'¶
-
LAG_DIFF_COL_NAME
= '{0} lag_diff{1}'¶
-
LEAD_COL_NAME
= '{0} lead{1}'¶
-
LEAD_DIFF_COL_NAME
= '{0} lead_diff{1}'¶
-
static
exp_moving_avg
(groupby, halflife, min_periods)¶ - Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)
Parameters: - groupby –
- halflife – The period of time for the exponential weight to reduce to one half
- min_periods – The minimum number of non-missing values in the window, including current value, required (NaN returned otherwise)
Returns: A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance
-
static
gen_fore_diffs
(groupby, leads)¶ Create a dataframe of where each column corresponds to the difference of an offset series with the given series
Parameters: - groupby –
- leads – The values by which to shift the series
Returns: A list of #fore columns where column i is the differnece of the left shift series (lag - i) and the given series
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]
-
static
gen_lag_diffs
(groupby, lags)¶ Create a dataframe of where each column corresponds to the difference of a given series and an offset series=
Parameters: - groupby –
- lags – The maximum number of values by which to shift the series
Returns: A list of #lag columns where column i is the difference of the series and a right shift of size (lag - i) of the given series:
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]
-
static
gen_lags
(groupby, lag)¶ Create a dataframe of where each column corresponds to an offset series of the given series
Parameters: - groupby –
- lag – The maximum number of values by which to shift the series
Returns: A list of #lag columns where column i is a right shift of size (lag - i) of the given series:
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]
-
static
gen_leads
(groupby, leads)¶ Create a dataframe of where each column corresponds to an offset series of the given series
Parameters: - groupby –
- leads – The numbers of values by which to shift the series
Returns: A list of #lead columns where column i is a left shift of size (lead - i) of the given series:
if n = 3 and lead = 2 series = [x₀, x₁, x₂] col₀ = [x₁, x₂, NA] col₁ = [x₂, NA, NA] In general for a given lean = l and a series of length n, colₖ = [xₖ₊₁, xₖ₊₂, … xₙ, NA,…, NA]
-
static
get_selection_name
(groupby)¶
-
static
get_shift_name
(name_pattern, col_name, shift)¶
-
msecoreml.pdmultiindexex module¶
-
class
msecoreml.pdmultiindexex.
PdMultiIndexEx
¶ Bases:
object
A collection of utility methods for working with Pandas multi-indexes.
-
static
align_index
(multiindex, index_names)¶ Align given multiindex to specified index
Get an updated version of the given multinindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values and names in the multiindex but not in index_names are dropped
Parameters: - multiindex (
MultiIndex
) – The multinindex to update - index_names (list(str)) – An ordered list of index names for the new index
Returns: An updated version of the given multiindex with indexes ordered as specified by the set of index names where indexes not yet contained in the multiindex are added but have all missing values.
Return type: MultiIndex
For the following examples, let mutliindex be the following column multiindex
letter A B A B A A number 1 1 1 2 2 2 blah 0 0 A A C C Example: Input:
- index_names: [number, missing, blah, letter]
Output:
number 1 1 1 2 2 2 missing None None None None None None letter A B A B A A blah 0 0 A A C C Example: Input:
- index_names: [number, missing]
Output:
number 1 1 1 2 2 2 missing None None None None None None - multiindex (
-
static
concat_index_levels
(midx, sep='_', filter_blanks=True, name=None)¶ Takes all the levels of a multiindex, turns to string and then concatenates into plain Index. :param midx: :param str sep: Separator :param filter_blanks: Do we only put sep between non-blank entries? :param name: final name for Index
-
static
fillna_multiindex
(idx, value='', level_nums=None)¶ Indexes sometimes a level l in a multi-index will have has nan’s stored as -1 in the .labels[l] for something not in .levels[l]
.get_level_values(l) #will return things with math.nan .fillna()/.hasnans #aren't implemented df.index.set_levels([l.fillna(value) for l in df.index.levels], inplace=True) #Doesn't work index.get_level_values(l).fillna('') #returns some right df.index.set_levels([df.index.get_level_values(l).fillna('') for l in range(len(df.index.levels))], inplace=True) #Doesn't work
-
static
get_1level_block_indicator
(multiindex, index_name, index_values)¶ Get an indicator array indicating the positions in the multiindex with one of the specified values
Parameters: - multiindex (
MultiIndex
) – The multiindex for which to get an indicator - index_name (str) – The name of the index for which to build an indicator for the specified values
- index_values (list(str or int)) – A list of values for which the indicator should be True
Returns: A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.
Return type: numpy.array
of boolFor the following examples, let the mutliindex be the column index of the following dataframe
site 1 1 2 2 3 Nan item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 Example: Input:
- index_name: site
- index_values: [1, 2]
Output:
[True, True, True, True, False, False]
Example: Input:
- index_name: site
- index_values: [NaN]
- Output:
[False, False, False, False, True, True]
- multiindex (
-
static
get_aligned_index
(source_index, target_index_names)¶
-
static
get_levelvalues_by_name
(multiindex, index_name)¶ Return the list of level values for the index with the specified name
Parameters: - multiindex – The multi-index containing the specified target index
- index_name – The name of the index for which to get the level values
Returns: A list of level values for the index with the specified name
site 1 1 2 2 3 3 item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 index_name = site => [1, 2, 3] index_name = item => [A, B]
-
static
get_nlevel_block_indicator
(multiindex, index_value_byname)¶ Get an indicator array indicating the positions in the multiindex with the specified values
Parameters: - multiindex (
MultiIndex
) – The multiindex for which to get an indicator - index_value_byname (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex.
Returns: A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.
Return type: numpy.array
of boolFor the following examples, let the mutliindex be the column index of the following dataframe
site 1 1 2 NaN 3 NaN item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 Example: Input:
- value_by_name: {site:1, item:B}
Output:
[False, True, False, False, False, False]
Example: Input:
- value_by_name: {site:Nan, item:B}
Output:
[False, False, False, True, False, True]
- multiindex (
-
static
get_nlevel_singleton_indicator
(multiindex, value_by_name)¶ Get an indicator array indicating the single position in the multiindex with the specified values
Parameters: - multiindex (
MultiIndex
) – The multiindex for which to get an indicator - value_by_name (dict(str, str or int)) – A dictionary containg a mapping of index name to target value for each index in the multiindex
Returns: A boolean array corresponding to the values of the multiindex with a single True value corresponding to the multiindex location with the specified values.
Return type: numpy.array
of boolFor the following examples, let the mutliindex be the column index of the following dataframe
site 1 1 2 2 3 Nan item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 Example: Input:
- value_by_name: {site:1, item:B}
Output:
[False, True, False, False, False, False]
Example: Input:
- value_by_name: {site:Nan, item:B}
Output:
[False, False, False, False, False, True]
- multiindex (
-
static
get_some_levels_block_indicator
(multiindex, index_value_byname)¶ Get an indicator array indicating the positions in the multiindex with the specified values
Parameters: - multiindex – The multiindex for which to get an indicator
- index_value_byname – A dictionary containg a mapping of index name to target value for some (not nessecarily all) indices in the multiindex.
Returns: A boolean array corresponding to the values of the multiindex with a True values corresponding to multiindex locations with the specified values.
site 1 1 2 2 NaN Nan item A B A B A B 1 2 10 20 100 200 3 4 30 40 300 400 {site:1, item:B} => [False, True, False, False, False, False] {site:Nan, item:B} => [False, False, False, False, True, True]
-
static
midx_from_list_of_namedarray
(na_list)¶ Create a MultiIndex from a list of Index’s or pd.Series’s
Parameters: na_list – list of Index’s, or pd.Series (needs .values and .name) for each
-
static
rename_mi_level_names
(idx, level, dict_map)¶
-
static
to_frame
(midx, index=True)¶
-
static
msecoreml.pdonehotencoder module¶
-
class
msecoreml.pdonehotencoder.
PdOneHotEncoder
(drop_base=True, base_criteria='freq')¶ Bases:
msecoreml.basetransformer.BaseTransformer
Wrapper for pandas.Series One-Hot encoder
The transformed values will adhere to the following:
- The row index of the returned dataframe will equal that of the given series (the series to be transformed)
- The column index of the returned dataframe will contain a single index with name equal to the name of the series and the level of each column corresponding the value of the item that column encodes.
Note:
- The empty string is encoded like any other string.
- None is treated as missing and is not encoded.
Letter Number ValueToEncode A 1 “Coke” A 2 “Pepsi” B 1 “Tab” B 2 “Pepsi” C 1 “” C 2 None The following dataframe is returned, where the column multi-index contains a single index ValueToEncode with levels [Coke, Pepsi, Tab]
Value Letter Value Number “Coke” “Pepsi” “Tab” “” A 1 1 0 0 0 A 2 0 1 0 0 B 1 0 0 1 0 B 2 0 1 0 0 C 1 0 0 0 1 C 2 0 0 0 0 -
BASE_FREQ
= 'freq'¶
-
__init__
(drop_base=True, base_criteria='freq')¶ Parameters: - drop_base – When returning the transformation, do we include one-hots for all, or all-minus-one The latter is sometime called “dummy variables” in contrast to “normal” one-hot-encoding which does all.
- base_criteria – How to determine base. Options: freq - String. Modal value. This works better with penalization. <number> - Number as String. Value to be the base.
-
base_criteria
¶
-
drop_base
¶
-
fit
(series)¶ Fit the the encoder to the given series
Parameters: series – The series to which to fit an encoding
-
fit_transform
(series)¶ Fit an encoder to the given series and return the encoded values of the series
Parameters: series – The series to which to fit an encoding and then transform. Returns: The dataframe resulting from encoding the given series
-
labels
¶
-
transform
(series)¶ Encode the given series using the previously fit encoding
Parameters: series – The series to encode Returns: The dataframe resulting from encoding the given series If a new value is encountered in the series (i.e., a value that was not present in the series used to fit the encoder), the resulting encoding for that value will be all zeroes. For example,
Fit:
ValueToEncode ==> ValueToEncode Coke Pepsi Coke 1 0 Pepsi 0 1 Pepsi 0 1 Transform:
ValueToEncode ==> ValueToEncode Coke Pepsi Coke 1 0 Tab 0 0
msecoreml.pdseriesex module¶
-
class
msecoreml.pdseriesex.
PdSeriesEx
¶ Bases:
object
-
static
avg_across_series
(series_collection)¶ Create a series by taking the average across the given set of series
Parameters: series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]] Returns: The elementwise avereage of the the series [(x1₀ + x2₀ + .. + xK₀) / K, (x1₁ + x2₁ + .. + xK₁) / K,… ]
-
static
concat_along_rows
(series, levels=None, name=None)¶ Create a dataframe combining all the given series
Parameters: - series – List of series
- levels –
- name –
Returns: A single dataframe containing all the given columns
-
static
exp_moving_avg
(series, halflife, min_periods)¶ Compute the Exponential Weighted Moving Average for each instance (x) using the following weights (w): wᵢ := (exp(log(0.5) / halflife))ⁱ exp_rolling_avg = (wₜ·xₜ + wₜ₋₁·xₜ₋₁ + …. w₀·x₀) / (wₜ + wₜ₋₁ + … + w₀)
Parameters: - series – data
- halflife – The period of time for the exponential weight to reduce to one half
- min_periods – The minimum number of non-missing values in the window required (NaN returned otherwise)
Returns: A single series where each instance is the Exponential Weighted Moving Average of the series up to and including that instance
-
static
gen_fore_diffs
(series, fore)¶ Create a dataframe of where each column corresponds to the difference of an offset series with the given series
Parameters: - series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
- fore – The maximum number of values by which to shift the series
Returns: A dataframe with #fore columns where column i is the differnece of the left shift series (lag - i) and the given series
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [x₂-x₀, NA, NA] col₁ = [x₁-x₀, x₂-x₁, NA] In general for a given fore = l and a series of length n, colₖ = [xₗ-x₀, xₗ₊₁-x₁,…, NA]
-
static
gen_lag_diffs
(series, lag)¶ Create a dataframe of where each column corresponds to the difference of a given series and an offset series=
Parameters: - series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
- lag – The maximum number of values by which to shift the series
Returns: A dataframe with #lag columns where column i is the differnece of the series and a right shift of size (lag - i) of the given series:
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₂-x₀] col₁ = [NA, x₁-x₀, x₂-x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, xₖ-x₀, …, xₖ-xₙ₋ₗ₋₁₊ₖ]
-
static
gen_lags
(series, lag)¶ Create a dataframe of where each column corresponds to an offset series of the given series
Parameters: - series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
- lag – The maximum number of values by which to shift the series
Returns: A dataframe with #lag columns where column i is a right shift of size (lag - i) of the given series:
if n = 3 and lag = 2 series = [x₀, x₁, x₂] col₀ = [NA, NA, x₀] col₁ = [NA, x₀, x₁] In general for a given lag = l and a series of length n, colₖ = [NA, NA,…, x₀, …, xₙ₋ₗ₋₁₊ₖ]
-
static
gen_leads
(series, lead)¶
-
static
gen_log_series
(series)¶ Create a series by taking the log of each instance in the given series.
Parameters: series – A 1-dimensional pandas series [x₀, x₁, … xₙ] Returns: A series the log values of the given series [log(x₀), log(x₁), …, log(xₙ)]
-
static
gen_scaled_series
(series, scales)¶ Create a series by multiplying a given series by a corresponding series of scale values
Parameters: - series – A 1-dimensional pandas series [x₀, x₁, … xₙ]
- scales – A 1-dimensional pandas series [s₀, s₁, … sₙ]
Returns: The pairwise product of the two series [x₀s₀, x₁s₁, …, xₙsₙ]
-
static
get_nan_inf_indicator
(series)¶
-
static
root_meansqr
(series)¶
-
static
sum_across_series
(series_collection)¶ Create a series by taking the average across the given set of series
Parameters: series_collection – A list of 1-dimensional pandas series [[x1₀, x1₁, … x1ₙ], [x2₀, x2₁, … x2ₙ], … [xK₀, xK₁, … xKₙ]] Returns: The elementwise sum of the the series [(x1₀ + x2₀ + .. + xK₀), (x1₁ + x2₁ + .. + xK₁),… ]
-
static