[ PROMPT_NODE_27412 ]

Shap 工作流

[ SKILL_DOCUMENTATION ]

# SHAP 工作流与最佳实践本文档提供了在各种模型解释场景中使用 SHAP 的全面工作流、最佳实践和常见用例。 ## 基本工作流结构每个 SHAP 分析都遵循一个通用工作流： 1. **训练模型**：构建并训练机器学习模型 2. **选择解释器**：根据模型类型选择合适的解释器 3. **计算 SHAP 值**：为测试样本生成解释 4. **可视化结果**：使用图表理解特征影响 5. **解释与行动**：得出结论并做出决策 ## 工作流 1：基本模型解释 **用例**：理解已训练模型的特征重要性和预测行为 python import shap import pandas as pd from sklearn.model_selection import train_test_split # 第 1 步：加载并拆分数据 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 第 2 步：训练模型 (以 XGBoost 为例) import xgboost as xgb model = xgb.XGBClassifier(n_estimators=100, max_depth=5) model.fit(X_train, y_train) # 第 3 步：创建解释器 explainer = shap.TreeExplainer(model) # 第 4 步：计算 SHAP 值 shap_values = explainer(X_test) # 第 5 步：可视化全局重要性 shap.plots.beeswarm(shap_values, max_display=15) # 第 6 步：详细检查顶级特征 shap.plots.scatter(shap_values[:, "Feature1"]) shap.plots.scatter(shap_values[:, "Feature2"], color=shap_values[:, "Feature1"]) # 第 7 步：解释个体预测 shap.plots.waterfall(shap_values[0]) **关键决策**： - 基于模型架构的解释器类型 - 背景数据集大小 (用于 DeepExplainer, KernelExplainer) - 要解释的样本数量 (全部测试集 vs. 子集) ## 工作流 2：模型调试与验证 **用例**：识别并修复模型问题，验证预期行为 python # 第 1 步：计算 SHAP 值 explainer = shap.TreeExplainer(model) shap_values = explainer(X_test) # 第 2 步：识别预测错误 predictions = model.predict(X_test) errors = predictions != y_test error_indices = np.where(errors)[0] # 第 3 步：分析错误 print(f"总错误数: {len(error_indices)}") print(f"错误率: {len(error_indices) / len(y_test):.2%}") # 第 4 步：解释分类错误的样本 for idx in error_indices[:10]: # 前 10 个错误 print(f"n=== 错误 {idx} ===") print(f"预测值: {predictions[idx]}, 实际值: {y_test.iloc[idx]}") shap.plots.waterfall(shap_values[idx]) # 第 5 步：检查

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI