乡下人产国偷v产偷v自拍,国产午夜片在线观看,婷婷成人亚洲综合国产麻豆,久久综合给合久久狠狠狠9

  • <output id="e9wm2"></output>
    <s id="e9wm2"><nobr id="e9wm2"><ins id="e9wm2"></ins></nobr></s>

    • 分享

      ML之shap:基于adult人口普查收入二分類預測數(shù)據(jù)集(預測年收入是否超過50k)利用Shap值對XGBoost模型實現(xiàn)可解釋性案例之詳細攻略

       處女座的程序猿 2022-07-06 發(fā)布于上海

      ML之shap:基于adult人口普查收入二分類預測數(shù)據(jù)集(預測年收入是否超過50k)利用Shap值對XGBoost模型實現(xiàn)可解釋性案例之詳細攻略


      ?
      相關文章
      ML之shap:基于adult人口普查收入二分類預測數(shù)據(jù)集(預測年收入是否超過50k)利用Shap值對XGBoost模型實現(xiàn)可解釋性案例之詳細攻略
      ML之shap:基于adult人口普查收入二分類預測數(shù)據(jù)集(預測年收入是否超過50k)利用Shap值對XGBoost模型實現(xiàn)可解釋性案例之詳細攻略實現(xiàn)

      基于adult人口普查收入二分類預測數(shù)據(jù)集(預測年收入是否超過50k)利用Shap值對XGBoost模型實現(xiàn)可解釋性案例

      1、定義數(shù)據(jù)集

      dtypes_len: 15

      ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countrysalary
      39State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
      50Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
      38Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
      53Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
      28Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
      37Private284582Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0040United-States<=50K
      49Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica<=50K
      52Self-emp-not-inc209642HS-grad9Married-civ-spouseExec-managerialHusbandWhiteMale0045United-States>50K
      31Private45781Masters14Never-marriedProf-specialtyNot-in-familyWhiteFemale14084050United-States>50K
      42Private159449Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale5178040United-States>50K

      2、數(shù)據(jù)集預處理

      # 2.1、入模特征初步篩選

      df.columns?
      ?14

      # 2.2、目標特征二值化

      # 2.3、類別型特征編碼數(shù)字化

      filt_dtypes_len: 13 [('age', 'float32'), ('workclass', 'category'), ('fnlwgt', 'float32'), ('education_Num', 'float32'), ('marital_status', 'category'), ('occupation', 'category'), ('relationship', 'category'), ('race', 'category'), ('sex', 'category'), ('capital_gain', 'float32'), ('capital_loss', 'float32'), ('hours_per_week', 'float32'), ('native_country', 'category')]
      ?

      # 2.4、分離特征與標簽

      df_adult_display

      ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countrysalary
      039State-gov13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States0
      150Self-emp-not-inc13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States0
      238Private9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States0
      353Private7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States0
      428Private13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba0
      537Private14Married-civ-spouseExec-managerialWifeWhiteFemale0040United-States0
      649Private5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica0
      752Self-emp-not-inc9Married-civ-spouseExec-managerialHusbandWhiteMale0045United-States1
      831Private14Never-marriedProf-specialtyNot-in-familyWhiteFemale14084050United-States1
      942Private13Married-civ-spouseExec-managerialHusbandWhiteMale5178040United-States1

      df_adult

      ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countrysalary
      039713411412174040390
      150613240410013390
      23849061410040390
      35347260210040390
      428413210520004050
      537414245400040390
      64945381200016230
      75269240410045391
      83141441014014084050391
      942413240415178040391

      # 2.5、數(shù)據(jù)集整體切分

      df_len: 32561 ,train_test_index: 30933
      X.shape,y.shape: (30933, 12) (30933,)
      X_test.shape,y_test.shape: (1628, 12) (1628,)

      #3、模型訓練與推理

      # 3.1、數(shù)據(jù)集切分

      # 3.2、模型建立并訓練

      #?3.3、模型預測

      ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryy_val_prediy_val
      1131129494132000603900
      12519334104312186140403911
      292252741341014100453900
      542822492704100403900
      2400327104112000403900
      4319454102404100403910
      2656443492604100403900
      472160013200410083901
      19518296921204100353900
      2501333452604100403900

      #4、模型特征重要性解釋可視化

      #4.1、全局特征重要性可視化

      # T1、基于模型本身輸出特征重要性

      ?XGBR_importance_dict: [('age', 130), ('capital_gain', 125), ('education_num', 86), ('capital_loss', 75), ('hours_per_week', 63), ('relationship', 59), ('marital_status', 52), ('occupation', 52), ('workclass', 20), ('sex', 13), ('native_country', 10), ('race', 6)]

      # T2、利用Shap值解釋XGBR模型

      利用shap自帶的函數(shù)實現(xiàn)特征貢獻性可視化——特征重要性排序與上邊類似,但并不相同

      # (1)、創(chuàng)建Explainer并計算SHAP值

      # T2.1、輸出shap.Explanation對象

      # T2,2、輸出numpy.array數(shù)組

      shap2exp.values.shape (30933, 12) 
       [[ 0.31074238 -0.16607898  0.5617416  ... -0.04660619 -0.09465054
         0.00530914]
       [ 0.34912622 -0.16633348  0.65308005 ... -0.06718991 -0.9804511
         0.00515459]
       [ 0.21971266  0.02263742 -0.299867   ... -0.0583196  -0.09738331
         0.00415599]
       ...
       [-0.48140627  0.07019287 -0.30844492 ... -0.04253047 -0.10924102
         0.00649792]
       [ 0.39729887 -0.2313431  -0.45257783 ... -0.06502013  0.27416423
         0.00587647]
       [ 0.27594262  0.03170239  0.78293955 ... -0.06743324  0.31613
         0.00530914]]
      shap2array.shape (30933, 12) 
       [[ 0.31074238 -0.16607898  0.5617416  ... -0.04660619 -0.09465054
         0.00530914]
       [ 0.34912622 -0.16633348  0.65308005 ... -0.06718991 -0.9804511
         0.00515459]
       [ 0.21971266  0.02263742 -0.299867   ... -0.0583196  -0.09738331
         0.00415599]
       ...
       [-0.48140627  0.07019287 -0.30844492 ... -0.04253047 -0.10924102
         0.00649792]
       [ 0.39729887 -0.2313431  -0.45257783 ... -0.06502013  0.27416423
         0.00587647]
       [ 0.27594262  0.03170239  0.78293955 ... -0.06743324  0.31613
         0.00530914]]
      shap2exp.values與shap2array,兩個矩陣否相等: True

      # (2)、全樣本各特征shap值條形圖可視化

      ?# shap值高階交互可視化

      # (3)、全樣本各特征shap值蜂群圖可視化

      ?

      # (4)、全局特征重要性排序散點圖可視化

      ?

      ?

      #4.2、局部特征重要性可視化

      # (1)、單樣本全特征條形圖可視化

      前測試樣本:0

      .values =
      array([ 0.31074238, -0.16607898,  0.5617416 , -0.58709425, -0.08897061,
             -0.6133537 ,  0.01539118,  0.04758333, -0.3988452 , -0.04660619,
             -0.09465054,  0.00530914], dtype=float32)
      .base_values =
      -1.3270257
      .data =
      array([3.900e+01, 7.000e+00, 1.300e+01, 4.000e+00, 1.000e+00, 1.000e+00,
             4.000e+00, 1.000e+00, 2.174e+03, 0.000e+00, 4.000e+01, 3.900e+01])

      前測試樣本:1

      .values =
      array([ 0.34912622, -0.16633348,  0.65308005,  0.3069151 ,  0.26878497,
              0.5229906 ,  0.01030679,  0.04531586, -0.15429462, -0.06718991,
             -0.9804511 ,  0.00515459], dtype=float32)
      .base_values =
      -1.3270257
      .data =
      array([50.,  6., 13.,  2.,  4.,  0.,  4.,  1.,  0.,  0., 13., 39.])

      前測試樣本:10

      .values =
      array([ 0.27578622,  0.02686635, -0.0699547 ,  0.2820353 ,  0.3097189 ,
              0.55229187, -0.03686382,  0.05135565, -0.1607191 , -0.06321771,
              0.38190693,  0.02023092], dtype=float32)
      .base_values =
      -1.3270257
      .data =
      array([37.,  4., 10.,  2.,  4.,  0.,  2.,  1.,  0.,  0., 80., 39.])

      前測試樣本:20

      .values =
      array([ 0.31008577,  0.00316932,  1.3133987 ,  0.16768128,  0.18239255,
              0.6863757 ,  0.00508371,  0.05159741, -0.15813455, -0.06736177,
              0.31327826,  0.01936885], dtype=float32)
      .base_values =
      -1.3270257
      .data =
      array([40.,  4., 16.,  2., 10.,  0.,  4.,  1.,  0.,  0., 60., 39.])

      # (2)、單轉雙特征全樣本局部獨立圖散點圖可視化

      # (3)、雙特征全樣本散點圖可視化

      # 4.3、模型特征篩選

      # (1)、基于聚類的shap特征篩選可視化

      5、模型預測的可解釋性(可主要分析誤分類的樣本)

      提供了預測的細節(jié),側重于解釋單個預測是如何生成的。它可以幫助決策者信任模型,并且解釋各個特征是如何影響模型單次的決策。

      # ?5.1、力圖可視化分析:可視化單個或多個樣本內各個特征貢獻度對比模型預測值——探究誤分類樣本

      提供了單一模型預測的可解釋性,可用于誤差分析,找到對特定實例預測的解釋。如樣例0所示:
      (1)、模型輸出值:5.89;
      (2)、基值:base value即explainer.expected_value,即模型輸出與訓練數(shù)據(jù)的平均值;
      (3)、繪圖箭頭下方數(shù)字是此實例的特征值。如Age=39;
      (4)、紅色則表示該特征的貢獻是正數(shù)(將預測推高的特征)藍色表示該特征的貢獻是負數(shù)(將預測的特征)。長度表示影響力;箭頭越長,特征對輸出的影響(貢獻)越大。通過 x 軸上刻度值可以看到影響的減少或增加量。

      (1)、單個樣本力圖可視化—對比預測

      輸出當前測試樣本:0

      mode_exp_value: -1.3270257
      <IPython.core.display.HTML object>
      輸出當前測試樣本:0 
       age               29.0
      workclass          4.0
      education_num      9.0
      marital_status     4.0
      occupation         1.0
      relationship       3.0
      race               2.0
      sex                0.0
      capital_gain       0.0
      capital_loss       0.0
      hours_per_week    60.0
      native_country    39.0
      y_val_predi        0.0
      y_val              0.0
      Name: 11311, dtype: float64
      輸出當前測試樣本的真實label: 0
      輸出當前測試樣本的的預測概率: 0

      輸出當前測試樣本:1

      輸出當前測試樣本:1 
       age                 33.0
      workclass            4.0
      education_num       10.0
      marital_status       4.0
      occupation           3.0
      relationship         1.0
      race                 2.0
      sex                  1.0
      capital_gain      8614.0
      capital_loss         0.0
      hours_per_week      40.0
      native_country      39.0
      y_val_predi          1.0
      y_val                1.0
      Name: 12519, dtype: float64
      輸出當前測試樣本的真實label: 1
      輸出當前測試樣本的的預測概率: 1

      輸出當前測試樣本:5?

      輸出當前測試樣本:5 
       age               45.0
      workclass          4.0
      education_num     10.0
      marital_status     2.0
      occupation         4.0
      relationship       0.0
      race               4.0
      sex                1.0
      capital_gain       0.0
      capital_loss       0.0
      hours_per_week    40.0
      native_country    39.0
      y_val_predi        1.0
      y_val              0.0
      Name: 4319, dtype: float64
      輸出當前測試樣本的真實label: 0
      輸出當前測試樣本的的預測概率: 1

      輸出當前測試樣本:7?

      輸出當前測試樣本:7 
       age               60.0
      workclass          0.0
      education_num     13.0
      marital_status     2.0
      occupation         0.0
      relationship       0.0
      race               4.0
      sex                1.0
      capital_gain       0.0
      capital_loss       0.0
      hours_per_week     8.0
      native_country    39.0
      y_val_predi        0.0
      y_val              1.0
      Name: 4721, dtype: float64
      輸出當前測試樣本的真實label: 1
      輸出當前測試樣本的的預測概率: 0

      (2)、多個樣本力圖可視化

      # (2.1)、特征貢獻度力圖可視化,利用深紅色深藍色地圖可視化前 5個預測解釋,可以使用X數(shù)據(jù)集。

      # (2.2)、誤分類力圖可視化,肯定要用X_val數(shù)據(jù)集,因為涉及到模型預測。
      如果對多個樣本進行解釋,將上述形式旋轉90度然后水平并排放置,得到力圖的變體

      # ?5.2、決策圖可視化分析:模型如何做出決策

      # (1)、單個樣本決策圖可視化

      # (2)、多個樣本決策圖可視化

      # (2.1)、部分樣本決策圖可視化

      # (2.2)、誤分類樣本決策圖可視化

        轉藏 分享 獻花(0

        0條評論

        發(fā)表

        請遵守用戶 評論公約

        類似文章 更多