pricingengine.models package

Submodules

pricingengine.models.boosting module

class pricingengine.models.boosting.BoostedTrees(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn GradientBoostingRegressor model

__init__(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)
Parameters:
  • learning_rate – This is the shrinkage (regularization) parameter
  • n_estimators – Number of Trees
  • max_depth – maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
  • random_state

pricingengine.models.causalmodel module

class pricingengine.models.causalmodel.CausalModel

Bases: pricingengine.models.linearmodel.LinearModel

Abstact base class for all linear causal models

SE_NAME = 'se'
__init__()
fit(mtx_x, vec_y, cluster_groups=None)

Reimplement the model.fit now that we have a new parameter

Parameters:
  • mtx_x – A Pandas Dataframe of m rows by n columns
  • vec_y – A Pandas Series of length m
  • cluster_groups – Optional pandas series of length m indicating group ids for clustering
get_standard_errors()

Return the standard errors of the computed model

get_variance_matrix()

Return the standard errors of the computed model

pricingengine.models.debiased_lasso module

class pricingengine.models.debiased_lasso.DebiasedLasso(allowed_bias=None, lasso_tol=0.001)

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(allowed_bias=None, lasso_tol=0.001)
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()

pricingengine.models.ensemble module

class pricingengine.models.ensemble.BucketSS(models, num_splits)

Bases: pricingengine.models.model.SampleSplitModel

Bucket of Models is just picking the one best model via submodel MSE. The final MSE is in-sample.

__init__(models, num_splits)
static gen_best_ensemble(models, num_splits, norm_weights_one=False)

The “Best” model from CCDDHNR 2017 is gen_best_ensemble([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True)

class pricingengine.models.ensemble.CrossFitContainer(base_models)

Bases: pricingengine.models.model.SampleSplitModel

A convenient way to store separate copies of the same model that are trained on different folds of data. Often they are models of the same class. The MSE is out-of-sample.

__init__(base_models)
base_models

Access list of base models

get_feature_info(avg_splits=False)

Returns a df with rows as folds/splits and columns for each variable

static wrap_generic_if_needed(base_model, n_folds)
static wrap_single_model_if_needed(base_model, n_folds)
class pricingengine.models.ensemble.StackedSS(models, num_splits, norm_weights_one=False)

Bases: pricingengine.models.model.SampleSplitModel

Stacked Generalization (stacking) is weighting submodels by doing a regression of output on the models’ OOS predictions. Some of the literature does this via a leave-one-out predictions but we do it off of out-of-fold predictions (NB: A “committee” would be equal weights) The “Ensemble” model from CCDDHNR 2017 is StackedSS([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True) The MSE is in-sample. For multi-task learning assumes equal weights among outcomes

__init__(models, num_splits, norm_weights_one=False)

pricingengine.models.fast_debiased_lasso module

class pricingengine.models.fast_debiased_lasso.FastDebiasedLasso(lasso_tol=0.001)

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(lasso_tol=0.001)
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()

pricingengine.models.lasso module

class pricingengine.models.lasso.Lasso(alpha=1.0)

Bases: pricingengine.models.linearmodel.LinearModel

Wrapper to sci-kit learn Lasso model

__init__(alpha=1.0)
get_coefficients()

Return the coefficients of the computed model

class pricingengine.models.lasso.LassoCV(tol=0.0001, max_iter=1000)

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(tol=0.0001, max_iter=1000)
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()

Return the standard errors of the computed model

variance

Return the underlying results object (contains more statistical properties)

pricingengine.models.linearmodel module

class pricingengine.models.linearmodel.LinearModel

Bases: pricingengine.models.model.Model

Abstact base class for all linear models

COEF_NAME = 'coef'
__init__()
get_coefficients()

Return the coefficients of the computed model

pricingengine.models.model module

class pricingengine.models.model.Model

Bases: object

Abstract base class for all basic models. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks

AVG_ERROR_COL_NAME = 'avg error'
CONST_NAME = 'const'
ERROR_VAR_NAME = 'error'
MAE_NAME = 'MAE'
MSE_NAME = 'MSE'
ONLY_SUBMODEL = 'only'
PREDICTION_COL_NAME = 'prediction'
R2_NAME = 'R2'
RESIDUAL_COL_NAME = 'residual'
SMAPE_NAME = 'sMAPE'
SUBMODEL_COLNAME = 'Submodel'
VARIABLE_COLNAME = 'variable'
__init__()
coeff_of_determination

r^2/R^2: Coefficient of determination . Will ignore NaNs

copy()

Return a copy of the model object

fit(x, y)

Fit the model to the given training and target data

Parameters:
  • x – A Pandas Dataframe of m rows by n columns
  • y – A Pandas Series of length m
fit_and_predict(x, y)

A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing

Parameters:
  • x – matrix of features
  • y – vector of realized outcomes
fit_predict_and_residualize(x, y)

Sets accuracy statistics

static get_flat_array(data)
static get_possible_col_index(data)
mae

mae() Mean Absolute Error. Will ignore NaNs

mse

mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs

n_tasks

Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task

static n_tasks_from_y(y)
predict(x, y_true=None)

Transform the given data using the previously fit model

Parameters:
  • x – A Pandas Dataframe of k rows and n columns
  • y_true – If passed in then sets accuracy statistics
Returns:

The Pandas series of length k resulting from applying the fit model to the given values

predict_and_residualize(x, y)

Compute residuals using a prefit model Sets accuracy statistics

Parameters:
  • x – matrix of features
  • y – vector of realized outcomes
static put_prediction_in_whole(prediction, full_n, inclusion_mask)
rmse

rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs

smape

smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs

static wrap_y_if_needed(y, x, name)
x_column_index

Return the colum Index/MultiIndex from the training data

class pricingengine.models.model.SampleSplitModel(num_splits)

Bases: object

ABC for models that keep track of what parts of data were used for fitting certain components. The most common case is that you have several ML models that fit on one section and then are used to predict out on another. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks

FOLD_COLNAME = 'fold'
NO_SPLIT = 'no split'
RECORDER_TRAININGON_NAME = 'trainingOn'
__init__(num_splits)
coeff_of_determination

r^2/R^2: Coefficient of determination . Will ignore NaNs

fit(x, y, folds)

Fit the model to the given training and target data

Parameters:
  • x – A Pandas Dataframe of m rows by n columns
  • y – A Pandas Series of length m
  • folds
fit_and_predict(x, y, folds)

A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing

Parameters:
  • x – matrix of features
  • y – vector of realized outcomes
  • folds
static fit_and_predict_mult(ssmodels, features, y_multi, folds)
Parameters:
  • ssmodels (dict) –
  • features (dict) –
  • y_multi (dict) –
  • folds
fit_predict_and_residualize(x, y, folds)

Sets accuracy statistics

get_fit_diagnostics()

return a single-rowed dataframe of MAE, MSE, sMAPE, and R2

mae

mae() Mean Absolute Error. Will ignore NaNs

mse

mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs

n_tasks

Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task

num_splits

Number of divisions into which to split data for computing residuals

predict(x, folds=None, y_true=None)
Parameters:
  • x
  • folds – If none, then assumes that none of this data was used for fitting.
  • y_true – If passed in then sets accuracy statistics
predict_and_residualize(x, y, folds=None)

Sets accuracy statistics

static predict_mult(ssmodels, features, folds)
Parameters:
  • ssmodels (dict) –
  • features (dict) –
  • folds
rmse

rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs

smape

smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs

x_column_index

Return the colum Index/MultiIndex from the training data

pricingengine.models.multitask_ols module

class pricingengine.models.multitask_ols.MultiTaskOLS(add_const=False)

Bases: pricingengine.models.model.Model

__init__(add_const=False)

pricingengine.models.neural_net module

class pricingengine.models.neural_net.NeuralNet(hidden_layer_sizes=(2, ), max_iter=200)

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn MLP Regressor

__init__(hidden_layer_sizes=(2, ), max_iter=200)

pricingengine.models.ols module

class pricingengine.models.ols.OLS(add_const=False)

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to statsmodels’s ols model

__init__(add_const=False)
static add_const(mtx_x)
add_est_results

Return the underlying results object (contains more statistical properties)

fit(mtx_x, vec_y, cluster_groups=None)
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the stadnard errors of the computed model

get_variance_matrix()
predict(mtx_x, vec_y_true=None)

Ovewritting

pricingengine.models.post_lasso module

class pricingengine.models.post_lasso.PostLasso(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sm ols after sci-kit LassoCV model

__init__(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))
add_est_results

Return the underlying results object (contains more statistical properties)

get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()

pricingengine.models.prepredicted module

class pricingengine.models.prepredicted.PrePredicted(prediction)

Bases: pricingengine.models.model.Model

Currently only works for standard (non-multi-task learning) models

__init__(prediction)
Parameters:prediction – Must has same indexing as df used for prediction
class pricingengine.models.prepredicted.SSPrePredicted(value)

Bases: pricingengine.models.model.SampleSplitModel

Store results for a model that has already been fit and predicted. This is used for baseline models to allow us to separate that part of the analysis pipeline.

To store the data from a builtin DDML model: ::
full_baseline_preds = pd.DataFrame() ddml.predict(ds, ret_pred=full_baseline_preds) PrePredicted.write_pred_df_to_csv(recorder_fname, full_baseline_preds)

The file will have have the first columns as the data (observation) index, followed by columns Variable, Lead, Prediction. Rows in the recorder file are uniquely identified by (data-index, Variable, Lead).

To load the data from this type of csv:

builtin_predictions = PrePredicted.get_rec_df_from_csv(recorder_fname, schema)
baselines = DynamicDML.gen_prepredicted_baselines(builtin_predictions)
ddml2 = DynamicDML(baseline_model=baselines, ...)

If you estimate a DynamicDML model for the whole data and then one on a subset (to do out-of-sample evaluation) then you will need different Prepredicteds because the folds will be different.

__init__(value)
get_coefficients()
static get_rec_df_from_csv(fname, schema, extra_index_vars=None)
Parameters:
  • fname – Filename
  • schema – Schema
  • extra_index_vars – Examples: [Model.VARIABLE_COLNAME, DynamicDML.LEAD_LEVEL_NAME]
static write_rec_df_to_csv(fname, recs)

pricingengine.models.randomforest module

class pricingengine.models.randomforest.RandomForest(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn RandomForestRegressor model

__init__(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)

pricingengine.models.ridge module

class pricingengine.models.ridge.Ridge(alpha)

Bases: pricingengine.models.causalmodel.CausalModel

ALPHA_MAX = 999999

Wrapper to sci-kit learn RidgeCV model

ALPHA_MIN = 1e-08
__init__(alpha)
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()
variance

Return the underlying results object (contains more statistical properties)

class pricingengine.models.ridge.RidgeCV

Bases: pricingengine.models.causalmodel.CausalModel

ALPHA_MAX = 999999

Wrapper to sci-kit learn RidgeCV model

ALPHA_MIN = 1e-08
__init__()
get_coefficients()

Return the coefficients of the computed model

get_standard_errors()

Return the coefficients of the computed model

get_variance_matrix()
variance

Return the underlying results object (contains more statistical properties)

pricingengine.models.zero module

class pricingengine.models.zero.One

Bases: pricingengine.models.model.Model

Always fits a constant OLS

__init__()
class pricingengine.models.zero.Zero

Bases: pricingengine.models.model.Model

Model that always returns fitted values of 0. Residual is then equal to target. Used if you don’t want to partial out anything from a variable

__init__()

Module contents