Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)
closed by

I have been attempting to register SHAP values for a Gradient Boosting Classifier in the H2O module in Python. Beneath there is the adjusted model in the documentation for the predict_contibutions strategy

import h2o

import shap

from h2o.estimators.gbm import H2OGradientBoostingEstimator

from h2o import H2OFrame

# initialize H2O

h2o.init()

# load JS visualization code to notebook

shap.initjs()

# Import the prostate dataset

h2o_df = h2o.import_file("https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")

# Split the data into Train/Test/Validation with Train having 70% and test and validation 15% each

train,test,valid = h2o_df.split_frame(ratios=[.7, .15])

# Convert the response column to a factor

h2o_df["CAPSULE"] = h2o_df["CAPSULE"].asfactor()

# Generate a GBM model using the training dataset

model = H2OGradientBoostingEstimator(distribution="bernoulli",

                                     ntrees=100,

                                     max_depth=4,

                                     learn_rate=0.1)

model.train(y="CAPSULE", x=["AGE","RACE","PSA","GLEASON"],training_frame=h2o_df)

# calculate SHAP values using function predict_contributions

contributions = model.predict_contributions(h2o_df)

# convert the H2O Frame to use with shap's visualization functions

contributions_matrix = contributions.as_data_frame().to_numpy() # the original method is as_matrix()

# shap values are calculated for all features

shap_values = contributions_matrix[:,0:4]

# expected values is the last returned column

expected_value = contributions_matrix[:,4].min()

# force plot for one observation

X=["AGE","RACE","PSA","GLEASON"]

shap.force_plot(expected_value, shap_values[0,:], X)

Check the below image, which I got from the above code:

What does the yield imply? Thinking about the issue above is a characterization issue, the anticipated value ought to be a likelihood (or even the classification anticipated - 0 or 1), correct? Both the base worth and the anticipated worth/Predicted value are negative.

Can anyone please help me?

closed

4 Answers

0 votes
by (19k points)
 
Best answer

Within the provided code, the term "yield" pertains to the computed SHAP (SHapley Additive exPlanations) values. These values signify the contribution of each feature towards the predictive outcome of the model for a specific instance. The SHAP values are obtained utilizing the "predict_contributions" function.

In this given context, it is important to note that the "expected value" does not represent a probability or a classification prediction. Instead, it serves as a fundamental reference point for comprehending the contributions made by individual features. It is worth mentioning that both the base value and the expected value can assume negative values, as SHAP values can be either positive or negative, signifying positive or negative impacts on the prediction, respectively.

For a more comprehensive understanding and resolution of the particular issue you have raised, additional contextual details would be greatly beneficial.

0 votes
by (26.4k points)

What you got is no doubt log-chances and not a likelihood itself. To get a likelihood, you need to change each log-chances to the likelihood space, for example 

p=e^x/(1 + e^x)

at the point when you use SHAP straightforwardly you can accomplish this by indicating model_output Parameter:

shap.TreeExplainer(model, data, model_output='probability')

Want to learn python to get expertise in the concepts of python? Join python certification course and get certified

0 votes
by (25.7k points)

In the given code, the yield refers to the predicted contribution values calculated using the SHAP (SHapley Additive exPlanations) method. SHAP values measure the impact of each feature on the model's prediction for a specific instance. The SHAP values are calculated for each feature in the input data.

In this case, the expected value represents the baseline prediction or average prediction of the model. It is not necessarily a probability or a classification value. The expected value helps establish a reference point for interpreting the contributions of individual features.

Regarding the negative values for both the base value and the expected value, it is important to note that SHAP values can be positive or negative. A positive SHAP value indicates that a feature contributes positively to the model's prediction, while a negative value suggests a negative contribution.

To gain further clarity on the specific issue you mentioned, it would be helpful to provide more context or details about the problem you are facing.

0 votes
by (15.4k points)

In the given code, the "yield" refers to the calculated SHAP (SHapley Additive exPlanations) values, which represent the contribution of each feature to the model's prediction for a particular instance. The SHAP values are obtained using the "predict_contributions" function.

In this context, the "expected value" does not represent a probability or a classification prediction. Instead, it serves as a baseline reference for interpreting the contributions of individual features. Both the base value and the expected value can be negative, as SHAP values can be positive or negative depending on whether a feature has a positive or negative impact on the prediction.

To better understand the specific issue you mentioned, it would be helpful to provide more details or context regarding the problem you are encountering.

Browse Categories

...