pricingengine.models package¶

Submodules¶

pricingengine.models.boosting module¶

class pricingengine.models.boosting.BoostedTrees(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)¶

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn GradientBoostingRegressor model

__init__(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)¶

Parameters:	learning_rate – This is the shrinkage (regularization) parameter n_estimators – Number of Trees max_depth – maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. random_state –

pricingengine.models.causalmodel module¶

class pricingengine.models.causalmodel.CausalModel¶

Bases: pricingengine.models.linearmodel.LinearModel

Abstact base class for all linear causal models

SE_NAME = 'se'¶

__init__()¶

fit(mtx_x, vec_y, cluster_groups=None)¶

Reimplement the model.fit now that we have a new parameter

Parameters:	mtx_x – A Pandas Dataframe of m rows by n columns vec_y – A Pandas Series of length m cluster_groups – Optional pandas series of length m indicating group ids for clustering

get_standard_errors()¶: Return the standard errors of the computed model

get_variance_matrix()¶: Return the standard errors of the computed model

pricingengine.models.debiased_lasso module¶

class pricingengine.models.debiased_lasso.DebiasedLasso(allowed_bias=None, lasso_tol=0.001)¶

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(allowed_bias=None, lasso_tol=0.001)¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶

pricingengine.models.ensemble module¶

class pricingengine.models.ensemble.BucketSS(models, num_splits)¶

Bases: pricingengine.models.model.SampleSplitModel

Bucket of Models is just picking the one best model via submodel MSE. The final MSE is in-sample.

__init__(models, num_splits)¶

static gen_best_ensemble(models, num_splits, norm_weights_one=False)¶: The “Best” model from CCDDHNR 2017 is gen_best_ensemble([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True)

class pricingengine.models.ensemble.CrossFitContainer(base_models)¶

Bases: pricingengine.models.model.SampleSplitModel

A convenient way to store separate copies of the same model that are trained on different folds of data. Often they are models of the same class. The MSE is out-of-sample.

__init__(base_models)¶

base_models¶: Access list of base models

get_feature_info(avg_splits=False)¶: Returns a df with rows as folds/splits and columns for each variable

static wrap_generic_if_needed(base_model, n_folds)¶

static wrap_single_model_if_needed(base_model, n_folds)¶

class pricingengine.models.ensemble.StackedSS(models, num_splits, norm_weights_one=False)¶

Bases: pricingengine.models.model.SampleSplitModel

Stacked Generalization (stacking) is weighting submodels by doing a regression of output on the models’ OOS predictions. Some of the literature does this via a leave-one-out predictions but we do it off of out-of-fold predictions (NB: A “committee” would be equal weights) The “Ensemble” model from CCDDHNR 2017 is StackedSS([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True) The MSE is in-sample. For multi-task learning assumes equal weights among outcomes

__init__(models, num_splits, norm_weights_one=False)¶

pricingengine.models.fast_debiased_lasso module¶

class pricingengine.models.fast_debiased_lasso.FastDebiasedLasso(lasso_tol=0.001)¶

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(lasso_tol=0.001)¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶

pricingengine.models.lasso module¶

class pricingengine.models.lasso.Lasso(alpha=1.0)¶

Bases: pricingengine.models.linearmodel.LinearModel

Wrapper to sci-kit learn Lasso model

__init__(alpha=1.0)¶

get_coefficients()¶: Return the coefficients of the computed model

class pricingengine.models.lasso.LassoCV(tol=0.0001, max_iter=1000)¶

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sci-kit learn LassoCV model

__init__(tol=0.0001, max_iter=1000)¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶: Return the standard errors of the computed model

variance¶: Return the underlying results object (contains more statistical properties)

pricingengine.models.linearmodel module¶

class pricingengine.models.linearmodel.LinearModel¶

Bases: pricingengine.models.model.Model

Abstact base class for all linear models

COEF_NAME = 'coef'¶

__init__()¶

get_coefficients()¶: Return the coefficients of the computed model

pricingengine.models.model module¶

class pricingengine.models.model.Model¶

Bases: object

Abstract base class for all basic models. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks

AVG_ERROR_COL_NAME = 'avg error'¶

CONST_NAME = 'const'¶

ERROR_VAR_NAME = 'error'¶

MAE_NAME = 'MAE'¶

MSE_NAME = 'MSE'¶

ONLY_SUBMODEL = 'only'¶

PREDICTION_COL_NAME = 'prediction'¶

R2_NAME = 'R2'¶

RESIDUAL_COL_NAME = 'residual'¶

SMAPE_NAME = 'sMAPE'¶

SUBMODEL_COLNAME = 'Submodel'¶

VARIABLE_COLNAME = 'variable'¶

__init__()¶

coeff_of_determination¶: r^2/R^2: Coefficient of determination . Will ignore NaNs

copy()¶: Return a copy of the model object

fit(x, y)¶

Fit the model to the given training and target data

Parameters:	x – A Pandas Dataframe of m rows by n columns y – A Pandas Series of length m

fit_and_predict(x, y)¶

A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing

Parameters:	x – matrix of features y – vector of realized outcomes

fit_predict_and_residualize(x, y)¶: Sets accuracy statistics

static get_flat_array(data)¶

static get_possible_col_index(data)¶

mae¶: mae() Mean Absolute Error. Will ignore NaNs

mse¶: mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs

n_tasks¶: Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task

static n_tasks_from_y(y)¶

predict(x, y_true=None)¶

Transform the given data using the previously fit model

Parameters:	x – A Pandas Dataframe of k rows and n columns y_true – If passed in then sets accuracy statistics
Returns:	The Pandas series of length k resulting from applying the fit model to the given values

predict_and_residualize(x, y)¶

Compute residuals using a prefit model Sets accuracy statistics

Parameters:	x – matrix of features y – vector of realized outcomes

static put_prediction_in_whole(prediction, full_n, inclusion_mask)¶

rmse¶: rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs

smape¶: smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs

static wrap_y_if_needed(y, x, name)¶

x_column_index¶: Return the colum Index/MultiIndex from the training data

class pricingengine.models.model.SampleSplitModel(num_splits)¶

Bases: object

ABC for models that keep track of what parts of data were used for fitting certain components. The most common case is that you have several ML models that fit on one section and then are used to predict out on another. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks

FOLD_COLNAME = 'fold'¶

NO_SPLIT = 'no split'¶

RECORDER_TRAININGON_NAME = 'trainingOn'¶

__init__(num_splits)¶

coeff_of_determination¶: r^2/R^2: Coefficient of determination . Will ignore NaNs

fit(x, y, folds)¶

Fit the model to the given training and target data

Parameters:	x – A Pandas Dataframe of m rows by n columns y – A Pandas Series of length m folds –

fit_and_predict(x, y, folds)¶

A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing

Parameters:	x – matrix of features y – vector of realized outcomes folds –

static fit_and_predict_mult(ssmodels, features, y_multi, folds)¶

Parameters:	ssmodels (dict) – features (dict) – y_multi (dict) – folds –

fit_predict_and_residualize(x, y, folds)¶: Sets accuracy statistics

get_fit_diagnostics()¶: return a single-rowed dataframe of MAE, MSE, sMAPE, and R2

mae¶: mae() Mean Absolute Error. Will ignore NaNs

mse¶: mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs

n_tasks¶: Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task

num_splits¶: Number of divisions into which to split data for computing residuals

predict(x, folds=None, y_true=None)¶

Parameters:	x – folds – If none, then assumes that none of this data was used for fitting. y_true – If passed in then sets accuracy statistics

predict_and_residualize(x, y, folds=None)¶: Sets accuracy statistics

static predict_mult(ssmodels, features, folds)¶

Parameters:	ssmodels (dict) – features (dict) – folds –

rmse¶: rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs

smape¶: smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs

x_column_index¶: Return the colum Index/MultiIndex from the training data

pricingengine.models.multitask_ols module¶

class pricingengine.models.multitask_ols.MultiTaskOLS(add_const=False)¶

Bases: pricingengine.models.model.Model

__init__(add_const=False)¶

pricingengine.models.neural_net module¶

class pricingengine.models.neural_net.NeuralNet(hidden_layer_sizes=(2, ), max_iter=200)¶

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn MLP Regressor

__init__(hidden_layer_sizes=(2, ), max_iter=200)¶

pricingengine.models.ols module¶

class pricingengine.models.ols.OLS(add_const=False)¶

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to statsmodels’s ols model

__init__(add_const=False)¶

static add_const(mtx_x)¶

add_est_results¶: Return the underlying results object (contains more statistical properties)

fit(mtx_x, vec_y, cluster_groups=None)¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the stadnard errors of the computed model

get_variance_matrix()¶

predict(mtx_x, vec_y_true=None)¶: Ovewritting

pricingengine.models.post_lasso module¶

class pricingengine.models.post_lasso.PostLasso(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))¶

Bases: pricingengine.models.causalmodel.CausalModel

Wrapper to sm ols after sci-kit LassoCV model

__init__(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))¶

add_est_results¶: Return the underlying results object (contains more statistical properties)

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶

pricingengine.models.prepredicted module¶

class pricingengine.models.prepredicted.PrePredicted(prediction)¶

Bases: pricingengine.models.model.Model

Currently only works for standard (non-multi-task learning) models

__init__(prediction)¶

Parameters:	prediction – Must has same indexing as df used for prediction

class pricingengine.models.prepredicted.SSPrePredicted(value)¶

Bases: pricingengine.models.model.SampleSplitModel

Store results for a model that has already been fit and predicted. This is used for baseline models to allow us to separate that part of the analysis pipeline.

To store the data from a builtin DDML model: ::: full_baseline_preds = pd.DataFrame() ddml.predict(ds, ret_pred=full_baseline_preds) PrePredicted.write_pred_df_to_csv(recorder_fname, full_baseline_preds)

The file will have have the first columns as the data (observation) index, followed by columns Variable, Lead, Prediction. Rows in the recorder file are uniquely identified by (data-index, Variable, Lead).

To load the data from this type of csv:

builtin_predictions = PrePredicted.get_rec_df_from_csv(recorder_fname, schema)
baselines = DynamicDML.gen_prepredicted_baselines(builtin_predictions)
ddml2 = DynamicDML(baseline_model=baselines, ...)

If you estimate a DynamicDML model for the whole data and then one on a subset (to do out-of-sample evaluation) then you will need different Prepredicteds because the folds will be different.

__init__(value)¶

get_coefficients()¶

static get_rec_df_from_csv(fname, schema, extra_index_vars=None)¶

Parameters:	fname – Filename schema – Schema extra_index_vars – Examples: [Model.VARIABLE_COLNAME, DynamicDML.LEAD_LEVEL_NAME]

static write_rec_df_to_csv(fname, recs)¶

pricingengine.models.randomforest module¶

class pricingengine.models.randomforest.RandomForest(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)¶

Bases: pricingengine.models.model.Model

Wrapper to sci-kit learn RandomForestRegressor model

__init__(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)¶

pricingengine.models.ridge module¶

class pricingengine.models.ridge.Ridge(alpha)¶

Bases: pricingengine.models.causalmodel.CausalModel

ALPHA_MAX = 999999¶: Wrapper to sci-kit learn RidgeCV model

ALPHA_MIN = 1e-08¶

__init__(alpha)¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶

variance¶: Return the underlying results object (contains more statistical properties)

class pricingengine.models.ridge.RidgeCV¶

Bases: pricingengine.models.causalmodel.CausalModel

ALPHA_MAX = 999999¶: Wrapper to sci-kit learn RidgeCV model

ALPHA_MIN = 1e-08¶

__init__()¶

get_coefficients()¶: Return the coefficients of the computed model

get_standard_errors()¶: Return the coefficients of the computed model

get_variance_matrix()¶

variance¶: Return the underlying results object (contains more statistical properties)

pricingengine.models.zero module¶

class pricingengine.models.zero.One¶

Bases: pricingengine.models.model.Model

Always fits a constant OLS

__init__()¶

class pricingengine.models.zero.Zero¶

Bases: pricingengine.models.model.Model

Model that always returns fitted values of 0. Residual is then equal to target. Used if you don’t want to partial out anything from a variable

__init__()¶

pricingengine.models package¶

Submodules¶

pricingengine.models.boosting module¶

pricingengine.models.causalmodel module¶

pricingengine.models.debiased_lasso module¶

pricingengine.models.ensemble module¶

pricingengine.models.fast_debiased_lasso module¶

pricingengine.models.lasso module¶

pricingengine.models.linearmodel module¶

pricingengine.models.model module¶

pricingengine.models.multitask_ols module¶

pricingengine.models.neural_net module¶

pricingengine.models.ols module¶

pricingengine.models.post_lasso module¶

pricingengine.models.prepredicted module¶

pricingengine.models.randomforest module¶

pricingengine.models.ridge module¶

pricingengine.models.zero module¶

Module contents¶

Pricing Engine

Navigation

Related Topics