pricingengine.models package¶
Submodules¶
pricingengine.models.boosting module¶
-
class
pricingengine.models.boosting.
BoostedTrees
(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)¶ Bases:
pricingengine.models.model.Model
Wrapper to sci-kit learn GradientBoostingRegressor model
-
__init__
(learning_rate=0.1, n_estimators=100, max_depth=3, random_state=None)¶ Parameters: - learning_rate – This is the shrinkage (regularization) parameter
- n_estimators – Number of Trees
- max_depth – maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
- random_state –
-
pricingengine.models.causalmodel module¶
-
class
pricingengine.models.causalmodel.
CausalModel
¶ Bases:
pricingengine.models.linearmodel.LinearModel
Abstact base class for all linear causal models
-
SE_NAME
= 'se'¶
-
__init__
()¶
-
fit
(mtx_x, vec_y, cluster_groups=None)¶ Reimplement the model.fit now that we have a new parameter
Parameters: - mtx_x – A Pandas Dataframe of m rows by n columns
- vec_y – A Pandas Series of length m
- cluster_groups – Optional pandas series of length m indicating group ids for clustering
-
get_standard_errors
()¶ Return the standard errors of the computed model
-
get_variance_matrix
()¶ Return the standard errors of the computed model
-
pricingengine.models.debiased_lasso module¶
-
class
pricingengine.models.debiased_lasso.
DebiasedLasso
(allowed_bias=None, lasso_tol=0.001)¶ Bases:
pricingengine.models.causalmodel.CausalModel
Wrapper to sci-kit learn LassoCV model
-
__init__
(allowed_bias=None, lasso_tol=0.001)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶
-
pricingengine.models.ensemble module¶
-
class
pricingengine.models.ensemble.
BucketSS
(models, num_splits)¶ Bases:
pricingengine.models.model.SampleSplitModel
Bucket of Models is just picking the one best model via submodel MSE. The final MSE is in-sample.
-
__init__
(models, num_splits)¶
-
static
gen_best_ensemble
(models, num_splits, norm_weights_one=False)¶ The “Best” model from CCDDHNR 2017 is gen_best_ensemble([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True)
-
-
class
pricingengine.models.ensemble.
CrossFitContainer
(base_models)¶ Bases:
pricingengine.models.model.SampleSplitModel
A convenient way to store separate copies of the same model that are trained on different folds of data. Often they are models of the same class. The MSE is out-of-sample.
-
__init__
(base_models)¶
-
base_models
¶ Access list of base models
-
get_feature_info
(avg_splits=False)¶ Returns a df with rows as folds/splits and columns for each variable
-
static
wrap_generic_if_needed
(base_model, n_folds)¶
-
static
wrap_single_model_if_needed
(base_model, n_folds)¶
-
-
class
pricingengine.models.ensemble.
StackedSS
(models, num_splits, norm_weights_one=False)¶ Bases:
pricingengine.models.model.SampleSplitModel
Stacked Generalization (stacking) is weighting submodels by doing a regression of output on the models’ OOS predictions. Some of the literature does this via a leave-one-out predictions but we do it off of out-of-fold predictions (NB: A “committee” would be equal weights) The “Ensemble” model from CCDDHNR 2017 is StackedSS([Lasso, BoostedTrees, RandomForest, NeuralNet], 5, True) The MSE is in-sample. For multi-task learning assumes equal weights among outcomes
-
__init__
(models, num_splits, norm_weights_one=False)¶
-
pricingengine.models.fast_debiased_lasso module¶
-
class
pricingengine.models.fast_debiased_lasso.
FastDebiasedLasso
(lasso_tol=0.001)¶ Bases:
pricingengine.models.causalmodel.CausalModel
Wrapper to sci-kit learn LassoCV model
-
__init__
(lasso_tol=0.001)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶
-
pricingengine.models.lasso module¶
-
class
pricingengine.models.lasso.
Lasso
(alpha=1.0)¶ Bases:
pricingengine.models.linearmodel.LinearModel
Wrapper to sci-kit learn Lasso model
-
__init__
(alpha=1.0)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
-
class
pricingengine.models.lasso.
LassoCV
(tol=0.0001, max_iter=1000)¶ Bases:
pricingengine.models.causalmodel.CausalModel
Wrapper to sci-kit learn LassoCV model
-
__init__
(tol=0.0001, max_iter=1000)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶ Return the standard errors of the computed model
-
variance
¶ Return the underlying results object (contains more statistical properties)
-
pricingengine.models.linearmodel module¶
-
class
pricingengine.models.linearmodel.
LinearModel
¶ Bases:
pricingengine.models.model.Model
Abstact base class for all linear models
-
COEF_NAME
= 'coef'¶
-
__init__
()¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
pricingengine.models.model module¶
-
class
pricingengine.models.model.
Model
¶ Bases:
object
Abstract base class for all basic models. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks
-
AVG_ERROR_COL_NAME
= 'avg error'¶
-
CONST_NAME
= 'const'¶
-
ERROR_VAR_NAME
= 'error'¶
-
MAE_NAME
= 'MAE'¶
-
MSE_NAME
= 'MSE'¶
-
ONLY_SUBMODEL
= 'only'¶
-
PREDICTION_COL_NAME
= 'prediction'¶
-
R2_NAME
= 'R2'¶
-
RESIDUAL_COL_NAME
= 'residual'¶
-
SMAPE_NAME
= 'sMAPE'¶
-
SUBMODEL_COLNAME
= 'Submodel'¶
-
VARIABLE_COLNAME
= 'variable'¶
-
__init__
()¶
-
coeff_of_determination
¶ r^2/R^2: Coefficient of determination . Will ignore NaNs
-
copy
()¶ Return a copy of the model object
-
fit
(x, y)¶ Fit the model to the given training and target data
Parameters: - x – A Pandas Dataframe of m rows by n columns
- y – A Pandas Series of length m
-
fit_and_predict
(x, y)¶ A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing
Parameters: - x – matrix of features
- y – vector of realized outcomes
-
fit_predict_and_residualize
(x, y)¶ Sets accuracy statistics
-
static
get_flat_array
(data)¶
-
static
get_possible_col_index
(data)¶
-
mae
¶ mae() Mean Absolute Error. Will ignore NaNs
-
mse
¶ mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs
-
n_tasks
¶ Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task
-
static
n_tasks_from_y
(y)¶
-
predict
(x, y_true=None)¶ Transform the given data using the previously fit model
Parameters: - x – A Pandas Dataframe of k rows and n columns
- y_true – If passed in then sets accuracy statistics
Returns: The Pandas series of length k resulting from applying the fit model to the given values
-
predict_and_residualize
(x, y)¶ Compute residuals using a prefit model Sets accuracy statistics
Parameters: - x – matrix of features
- y – vector of realized outcomes
-
static
put_prediction_in_whole
(prediction, full_n, inclusion_mask)¶
-
rmse
¶ rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs
-
smape
¶ smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs
-
static
wrap_y_if_needed
(y, x, name)¶
-
x_column_index
¶ Return the colum Index/MultiIndex from the training data
-
-
class
pricingengine.models.model.
SampleSplitModel
(num_splits)¶ Bases:
object
ABC for models that keep track of what parts of data were used for fitting certain components. The most common case is that you have several ML models that fit on one section and then are used to predict out on another. Will handle: * removing non-complete observations * making sure return type has appropriate container (e.g. Panda’s index) * type checks
-
FOLD_COLNAME
= 'fold'¶
-
NO_SPLIT
= 'no split'¶
-
RECORDER_TRAININGON_NAME
= 'trainingOn'¶
-
__init__
(num_splits)¶
-
coeff_of_determination
¶ r^2/R^2: Coefficient of determination . Will ignore NaNs
-
fit
(x, y, folds)¶ Fit the model to the given training and target data
Parameters: - x – A Pandas Dataframe of m rows by n columns
- y – A Pandas Series of length m
- folds –
-
fit_and_predict
(x, y, folds)¶ A combined fit and predict that might be a bit more data efficient Sets accuracy statistics Currently, will not predict for observations where outcome is missing
Parameters: - x – matrix of features
- y – vector of realized outcomes
- folds –
-
static
fit_and_predict_mult
(ssmodels, features, y_multi, folds)¶ Parameters: - ssmodels (dict) –
- features (dict) –
- y_multi (dict) –
- folds –
-
fit_predict_and_residualize
(x, y, folds)¶ Sets accuracy statistics
-
get_fit_diagnostics
()¶ return a single-rowed dataframe of MAE, MSE, sMAPE, and R2
-
mae
¶ mae() Mean Absolute Error. Will ignore NaNs
-
mse
¶ mspe: Mean Squared Error also called mean squared prediction error (mspe). Will ignore NaNs
-
n_tasks
¶ Returns the number of “tasks”. 1 indicates standard estimation. More indicates multi-task
-
num_splits
¶ Number of divisions into which to split data for computing residuals
-
predict
(x, folds=None, y_true=None)¶ Parameters: - x –
- folds – If none, then assumes that none of this data was used for fitting.
- y_true – If passed in then sets accuracy statistics
-
predict_and_residualize
(x, y, folds=None)¶ Sets accuracy statistics
-
static
predict_mult
(ssmodels, features, folds)¶ Parameters: - ssmodels (dict) –
- features (dict) –
- folds –
-
rmse
¶ rmspe: Root Mean Squared Error also called root mean squared prediction error (rmspe). Will ignore NaNs
-
smape
¶ smape: Symmetric Mean Absolute Percent Error. Use this rather than MAPE as that can be Inf. Will ignore NaNs
-
x_column_index
¶ Return the colum Index/MultiIndex from the training data
-
pricingengine.models.multitask_ols module¶
-
class
pricingengine.models.multitask_ols.
MultiTaskOLS
(add_const=False)¶ Bases:
pricingengine.models.model.Model
-
__init__
(add_const=False)¶
-
pricingengine.models.neural_net module¶
-
class
pricingengine.models.neural_net.
NeuralNet
(hidden_layer_sizes=(2, ), max_iter=200)¶ Bases:
pricingengine.models.model.Model
Wrapper to sci-kit learn MLP Regressor
-
__init__
(hidden_layer_sizes=(2, ), max_iter=200)¶
-
pricingengine.models.ols module¶
-
class
pricingengine.models.ols.
OLS
(add_const=False)¶ Bases:
pricingengine.models.causalmodel.CausalModel
Wrapper to statsmodels’s ols model
-
__init__
(add_const=False)¶
-
static
add_const
(mtx_x)¶
-
add_est_results
¶ Return the underlying results object (contains more statistical properties)
-
fit
(mtx_x, vec_y, cluster_groups=None)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the stadnard errors of the computed model
-
get_variance_matrix
()¶
-
predict
(mtx_x, vec_y_true=None)¶ Ovewritting
-
pricingengine.models.post_lasso module¶
-
class
pricingengine.models.post_lasso.
PostLasso
(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))¶ Bases:
pricingengine.models.causalmodel.CausalModel
Wrapper to sm ols after sci-kit LassoCV model
-
__init__
(lasso_model=LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False))¶
-
add_est_results
¶ Return the underlying results object (contains more statistical properties)
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶
-
pricingengine.models.prepredicted module¶
-
class
pricingengine.models.prepredicted.
PrePredicted
(prediction)¶ Bases:
pricingengine.models.model.Model
Currently only works for standard (non-multi-task learning) models
-
__init__
(prediction)¶ Parameters: prediction – Must has same indexing as df used for prediction
-
-
class
pricingengine.models.prepredicted.
SSPrePredicted
(value)¶ Bases:
pricingengine.models.model.SampleSplitModel
Store results for a model that has already been fit and predicted. This is used for baseline models to allow us to separate that part of the analysis pipeline.
- To store the data from a builtin DDML model: ::
- full_baseline_preds = pd.DataFrame() ddml.predict(ds, ret_pred=full_baseline_preds) PrePredicted.write_pred_df_to_csv(recorder_fname, full_baseline_preds)
The file will have have the first columns as the data (observation) index, followed by columns Variable, Lead, Prediction. Rows in the recorder file are uniquely identified by (data-index, Variable, Lead).
To load the data from this type of csv:
builtin_predictions = PrePredicted.get_rec_df_from_csv(recorder_fname, schema) baselines = DynamicDML.gen_prepredicted_baselines(builtin_predictions) ddml2 = DynamicDML(baseline_model=baselines, ...)
If you estimate a DynamicDML model for the whole data and then one on a subset (to do out-of-sample evaluation) then you will need different Prepredicteds because the folds will be different.
-
__init__
(value)¶
-
get_coefficients
()¶
-
static
get_rec_df_from_csv
(fname, schema, extra_index_vars=None)¶ Parameters: - fname – Filename
- schema – Schema
- extra_index_vars – Examples: [Model.VARIABLE_COLNAME, DynamicDML.LEAD_LEVEL_NAME]
-
static
write_rec_df_to_csv
(fname, recs)¶
pricingengine.models.randomforest module¶
-
class
pricingengine.models.randomforest.
RandomForest
(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)¶ Bases:
pricingengine.models.model.Model
Wrapper to sci-kit learn RandomForestRegressor model
-
__init__
(n_estimators=10, max_depth=None, n_jobs=1, random_state=None)¶
-
pricingengine.models.ridge module¶
-
class
pricingengine.models.ridge.
Ridge
(alpha)¶ Bases:
pricingengine.models.causalmodel.CausalModel
-
ALPHA_MAX
= 999999¶ Wrapper to sci-kit learn RidgeCV model
-
ALPHA_MIN
= 1e-08¶
-
__init__
(alpha)¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶
-
variance
¶ Return the underlying results object (contains more statistical properties)
-
-
class
pricingengine.models.ridge.
RidgeCV
¶ Bases:
pricingengine.models.causalmodel.CausalModel
-
ALPHA_MAX
= 999999¶ Wrapper to sci-kit learn RidgeCV model
-
ALPHA_MIN
= 1e-08¶
-
__init__
()¶
-
get_coefficients
()¶ Return the coefficients of the computed model
-
get_standard_errors
()¶ Return the coefficients of the computed model
-
get_variance_matrix
()¶
-
variance
¶ Return the underlying results object (contains more statistical properties)
-
pricingengine.models.zero module¶
-
class
pricingengine.models.zero.
One
¶ Bases:
pricingengine.models.model.Model
Always fits a constant OLS
-
__init__
()¶
-
-
class
pricingengine.models.zero.
Zero
¶ Bases:
pricingengine.models.model.Model
Model that always returns fitted values of 0. Residual is then equal to target. Used if you don’t want to partial out anything from a variable
-
__init__
()¶
-