Classification of Alzheimer’s Disease Stages using Radiology Imaging and Longitudinal Clinical Data – Part 6

Build a Classification Model using Tree-Based Algorithms

In this implementation, selected features to build classification model include subject’s education, gender, age and a small list of biomarkers as discussed in different studies (Li et al., 2017), (Goyal et al., 2018), (Azvan et al., 2018). The biomarkers are cognitive tests such as Clinical Dementia Rating Sum of Boxes (CDRSB) Mini-Mental State Exam (MMSE), MRI of the whole brain, hippocampus, middle temporal gyrus and entorhinal, PET measures of Fludeoxyglucose (FDG) and Florbetapir (AV45) and cerebrospinal fluid (CSF) measures of amyloid-beta, tau and phosphorylated tau level. A new feature is created from age to reflect the age when the subject visited the clinic. Gender and education are converted into dummy variables. The missing values are imputed and then all the values are scaled. The data is then divided into training and test data sets. The model is trained on the training data and evaluated on test data. This is a continuation from here.

Implementation, Evaluation and Results of Decision Tree using Different Numbers of Leaf Nodes

Decision tree is a non-linear, non-parametric algorithm which uses a tree-like graph in which each branch is an outcome of a conditional test and leaf node is a class label. The advantages include ease of understanding, identify relationships between two or more features and can handle both numeric and categorical features. However, the learners can create complex trees which overfit and are unstable because small variation in the data can result in different trees being created. The weaknesses are handled through methods such as bagging and boosting. It is implemented using scikit-learn library and the function used is DecisionTreeClassifier( ). The model is trained using 5, 50, 500, 5000, 50000 leaf nodes. The model resulted in average AUROC score of 0.845 using 5 leaf nodes, average AUROC score of 0.850 using 50 leaf nodes, average AUROC score of 0.78 using 500 leaf nodes, average AUROC score of 0.795 using 5000 leaf nodes and average AUROC score of 0.795 using 50,000 leaf nodes. The metrics show that the increase in the number of leaf nodes results in poor performance for the model.

Implementation, Evaluation and Result of Random Forest

Random forest and gradient boosting trees are ensemble learning methods and combine outputs from multiple individual trees. Random forest uses a random sample of data to train tree independently while gradient boosting trees build a new tree to correct the errors of the previous tree. Random forest is less likely to overfit than gradient boosting trees. Yet, the algorithm is slow to make predictions, biased towards features with more level and smaller groups are preferred if correlated features of the smaller groups. It is implemented using scikit-learn and the function is RandomForestClassifier(). The model resulted in predicting normal with AUROC score of 0.728 against dementia and MCI, MCI with AUROC score of 0.60 against normal and dementia and classify dementia with AUROC score of 0.877 against normal and MCI when the threshold is unfixed.

Implementation, Evaluation and Result of XGBoost

Gradient boosting trees can solve ranking problems because it is possible to write a gradient but take a long time to train as trees are built sequentially. Extreme Gradient Boosting (XGBoost) uses parallel computing to implement gradient boosting algorithm. It has regularization to reduce overfitting and built-in methods to handle missing values and cross-validation. It is implemented using XGBoost library and function used to implement is XGBClassifier() with the number of estimators set at 100.

Figure is a normalized confusion matrix and shows the values of correctly predicted class i.e., 0.90 for normal (NL), 0.38 for MCI and 1.0 for dementia.

The off-diagonal elements are mistakenly confused with the other classes. Therefore, it is better in classifying clinical stages of normal and dementia than MCI. The value of 1.0 for dementia shows that the model overfits when the threshold is fixed at 0.5. Hence, the threshold should be selected carefully.

The model resulted in predicting normal with AUROC score of 0.908 against dementia and MCI, MCI with AUROC score of 0.659 against normal and dementia and classify dementia with AUROC score of 0.853 against normal and MCI.

The figure is the AUROC curve and shows that the classifier is better in classifying normal against the other two classes than when predicting dementia or MCI.

Comparison of Developed Models

Figure shows AUROC score per class for the developed models.

Decision tree with 50 leaf nodes shows the best performance but it tends to overfit. XGBoost is the best model to distinguish between a class and other classes with AUROC score 0.908 for normal, 0.659 for MCI and 0.853 for dementia. It also shows that all the algorithms have a higher AUROC score for normal and dementia than MCI.

Interpreting Machine Learning Model

Figure shows the interpretation of the model prediction. It shows which features are important for a model by plotting the SHAP values of every feature for every sample. It takes the mean absolute value of the SHAP values for each feature. CDR-SB score is the most important feature with a mean SHAP value of 0.35. It is used to accurately stage the severity of dementia and MCI. MRI of the whole brain is the next important feature followed by another cognitive test called MMSE with mean values of 0.12 and 0.11 respectively. Gender has no impact on the model. This is in contrast to the finding from one of the studies that the accuracy of the prediction depends on the sex of the patient (Lee et al., 2016). Further, education has a small impact on the disease.

The report continues here.

Leave a Reply