Abstract 3
2 Introduction 4
3 Diabetes data set 7
3.1 Description of the dataset 7
3.2 Data preprocessing 8
4 Medical detection model 10
4.1 One-Class SVM algorithm 10
4.2 Prediction Model Results and Analysis 14
4.3 Experimental Results and Analysis of the Prediction Model 14
4.3.1 Evaluation Metrics 14
4.3.2 Experimental Results 16
5 Medical detection model interpretability methods 18
5.1 SHAP algorithm 18
5.1.1 Shapley values in cooperative games 18
5.1.2 SHAP algorithm in machine learning 19
5.1.3 Feature function of SHAP algorithm 20
5.1.4 Compute SHAP values 23
5.2 тalgorithm 25
5.2.1 т-values in cooperative games 25
5.2.2 т-values algorithm 26
6 Diabetes detection model interpretation methods 28
6.1 SHAP model and result analysis 28
6.1.1 Visualization of Predictions 28
6.1.2 SHAP Feature Importance 29
6.1.3 SHAP Summary Plot 31
6.2 т-values model and result analysis 33
6.2.1 т-values Feature Importance 33
6.3 Model Comparison 34
7 Predictive model based on XGBoost 36
7.1 XGBoost 36
7.2 XGBoost prediction model 39
7.2.1 A prediction model based on Shapley 39
7.2.2 A prediction model based on т-values 41
7.2.3 Interpretable Model Comparison 42
8 Conclusion 44
References 46
9 appendix 50
In the field of artificial intelligence, the interpretability of models has al¬ways been a focal point for researchers and engineers. With the widespread application of machine learning models across various domains, under¬standing the decision-making process of models has become an important topic. This paper utilizes machine learning to establish a medical detec¬tion model and conducts interpretability research on this model. The main algorithms used are the SHAP algorithm and the т-algorithm, exploring the performance of different cooperative game methods in interpretability research under the same model and data. Furthermore, the medical detec¬tion model is reconstructed using Shapley values and т-values based on the XGBoost model, analyzing and comparing the strengths and weaknesses of the two different methods. Based on this analysis, improvements are made to the model to enhance the credibility of the prediction results. The т- algorithm used in this paper is novel in the fields of machine learning and detection models.
This paper primarily focuses on the study of diabetes prediction models based on machine learning from the perspective of cooperative game theory. The challenge in medical prediction models lies in their interpretability. Most machine learning-based prediction models are ”black-box” models, making it difficult to explain their results in a way that convinces doctors and patients of their reliability. To address this issue, it is essential to develop a medical prediction model that can diagnose diabetes and analyze the diagnostic process, providing explanations for the results.
The process begins with the creation of a diabetes prediction model based on ONE-CLASS SVM. It is crucial to analyze and preprocess the dataset to avoid any adverse effects on the prediction results. Following this, the model is trained, prediction results are generated, and the model is evaluated. After the predictions are made, the results are interpreted to explain the feasibility of the model, thereby increasing the confidence of doctors and patients in the model’s predictions.
The explanation process is primarily divided into two steps. Firstly, Shapley values and т-values are computed based on the predictive model from the perspective of cooperative game theory. Both of these values can reflect the importance of features. After obtaining specific values, statistical graphs regarding instances and features are plotted to intuitively observe the influence of different features on the overall prediction results. The feature importance can then explain where the results of the ”black- box” predictive model come from.
Shapley values are a commonly used method for studying inter¬pretability, but т-values are being used for the first time. To investigate whether т-values can serve as a method for interpretability research, it is necessary to study their feasibility as an explanatory tool. Comparing the feature importance obtained by т-values with that obtained by the SHAP method reveals similar conclusions, indicating that т-values can be used as a method for researching interpretable AI. However, their performance in feature importance values is not significant.
I reconstructed medical detection models using т-values and SHAP values based on XGBoost. Both of these detection models exhibit high accuracy and predictive value, with similar results. Therefore, т -values can be fully applied to research on interpretable AI, and they can also be used to construct medical detection models along with SHAP values.
1. Zou J , Xu F , Zhang Y ,et al.High-Dimensional Explainable AI for Cancer Detection[J]. 2021.
2. Lundberg S , Lee S I .A Unified Approach to Interpreting Model Predictions[J]. 2017.DOI:10.48550/arXiv.1705.07874.
3. Lundberg S M , Erion G G , Lee S I .Consistent In¬dividualized Feature Attribution for Tree Ensembles[J].
2018.DOI:10.48550/arXiv.1802.03888.
4. HUANG Yuteng, Pei Xubin, Kong Libo, Li Bo, Yin Jie. Research on Power outlier User detection algorithm based on Ant colony Al¬gorithm improved One-Class SVM [J]. Automation and instrumen¬tation, 2019 (5) : 111-114. The DOI: 10.14016 / j.carol carroll nki. 1001-9227.2019.05.111.
5. Huang Meihuaju. Thyroid nodules can explain the AI diagnosis sys¬tem [D]. The research and implementation of donghua university, 2021. The DOI: 10.27012 /, dc nki. Gdhuu. 2021.001151.
6. ZHONG K T. Research and application of explainable AI diagnostic model for breast tumor [D]. Donghua university, 2021. DOI: 10.27012 /, dc nki. Gdhuu. 2021.000439.
7. Jing Jie, WANG Beilei, LIU Shanrong. Interpretable application of ar¬tificial intelligence in disease diagnosis and treatment [J]. Laboratory Medicine, 2019,36(09):976-980. (in Chinese)
8. Yuan Weilin, Luo Junren, Lu Lina, et al. Smart game against method: game theory and reinforcement learning perspective analysis [J]. Jour-
nal of computer science, 2022, 49 (8) : 14. DOI: 10.11896 / JSJKX.
220200174.
9. Lenatti M, Carlevaro A, Keshavjee K, Guergachi A, Paglialonga A, Mongelli M. Characterization of Type 2 Diabetes Using Counterfac- tuals and Explainable AI. Stud Health Technol Inform. 2022 May 25;294:98-103. doi: 10.3233/SHTI220404. PMID: 35612024.
10. Castro J, Gomez D, Tejada J. Polynomial calculation of the Shap¬ley value based on sampling[J]. Computers and Operations Research, 2009, 36(5): 1726-1730.
11. Cho YR, Kang M. Interpretable machine learning in bioinformatics. Methods. 2020 Jul 1;179:1-2. doi: 10.1016/j.ymeth.2020.05.024. Epub 2020 May 30. PMID: 32479800.
12. Adadi A, Berrada M. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI)[J]. IEEE Access, 2018, 6: 52138-52160.
13. Shapley L S. A value for n-person games[J]. Contributions to the Theory of Games, 1953,2(28): 307-317.
14. Balakrishnama S, Ganapathiraju A. Linear discriminant analysis- a brief tutorial[C]//Institute for Signal and information Processing. 1998, 18(1998): 1-8.
15. Hou N, Li M, He L, Xie B, Wang L, Zhang R, Yu Y, Sun X, Pan Z, Wang K. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020 Dec 7;18(1):462. doi: 10.1186/s12967-020-02620-5. PMID: 33287854; PMCID: PMC7720497...(22)