Classification of Alzheimer’s Disease Stages using Radiology Imaging and Longitudinal Clinical Data – Part 2

Literature Review of Alzheimer’s Disease Progression


This section discusses the papers published between 2004 and 2019 regarding Alzheimer’s disease progression. It starts by reviewing the application of machine learning on radiology images followed by different tests that are used to measure the clinical stage of the progression. This post is a continuation from this page.

Clinical data sets are also discussed to find the most suitable data set for the project. Various features, feature engineering and feature selection techniques used by the researchers are examined to gain knowledge. Finally, a critique of machine learning algorithms, techniques and evaluation metrics is conducted to identify the gaps. The key findings include Alzheimer’s Disease Neuroimaging Initiative (ADNI) is the most common data set for longitudinal studies, biomarkers from neuroimaging are good to develop a model, multiple machine learning algorithms are applied to classify different stages of the disease.

However, an ensemble of Extreme Gradient Boosting (XGBoost) and other classifiers has not been applied. Further, none of the models developed are deployed as a web-based application.

A Review of Radiology Images and Identified Gaps

A recent review of more than 30 papers shows the use of machine learning on radiology images for determining and conversion to Alzheimer’s disease (Nanni et al., 2016). The techniques recognize and classify complex patterns from various images to carry out clinical decisions with comparable performance to human. Furthermore, the application of machine learning to radiology images is estimated to grow in the next 5 – 10 years because of the active research (Zhang and Sejdi´c, 2019). Different neuroimaging techniques help to provide the primary markers of the brain pathology (Masdeu et al., 2005). It is also demonstrated that early detection using MRI help in treatment that delays the progression of Alzheimer’s disease (Lahmiri and Shmuel, 2018). MRI is a non-invasive and most sensitive imaging scan of the brain. It is employed to visualize the anatomical structure of the brain in a routine clinical environment. It also produces high spatial resolutions and image details (Wang et al., 2018). However, using only MRI can result in ignoring subtle abnormalities due to time. Hence, a longitudinal analysis of MRI is important (Cui and Liu, 2019). Moreover, existing models require assumptions regarding trajectories or discount relationship between patient’s trajectories and multiple biomarkers. Only a few papers researched non-image data with image data to properly capture the interaction. The approaches equally failed to quantify the degree of abnormalities on a normal scale (Aditya and Pande, 2017) and ignored chronological dependencies.

A Review of Tests to Measure Alzheimers Disease Progression

This section discusses the different tests used to measure Alzheimer’s disease progression to ensure the tests are recognized and then used as features.

Different tests act as tools to evaluate the progression. Tests use the state of the health, events like death and institutionalization and different mathematical approaches such as hazard ratios and probabilities. Some models e.g., Consortium to Establish a Registry for Alzheimer’s Disease (CERAD), Fenn and Gray measure evolution between disease severity or changes in cognitive functions. Other models e.g., CERAD Mini-Mental State Exam (MMSE) and Kinosian include placement in an institution or Assessment of Health Economics in Alzheimer’s Disease (AHEAD) model consider full-time care. Statistical models, in addition, use a diverse range of measurement scales e.g. MMSE, Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-cog) demonstrates a natural progression in Alzheimer’s disease. ADAS-Cog uses multiple biomarkers like language and memory and is the most applied test (Skinner et al., 2012). However, cognition solely is not an indicator as it fails to consider the effect of healthcare and costs (Green et al., 2011). Papers use ADAS-Cog and other biomarkers to calculate the progression at each time point (K. Fisher et al., 2018), (Young et al., 2014a), (Yang et al., 2011). Other tests performed are MMSE (Doody et al., 2010), Global Staging Clinical Dementia Rating (CDR) (Wang et al., 2018) or own metrics from the age of the patient to measure the start of the symptoms of Alzheimer’s disease (Bateman et al., 2012). However, these tests divide the progression from continuous to discrete stages and precise duration cannot be measured. To overcome the issue a paper uses the exponential-shaped trajectory of the ADAS-Cog score as a continuous factor (Schmidt-Richberg et al., 2016). In conclusion, there is no sole test to measure the progression. Tests can be performed in conjunction with other biomarkers to predict the progression across the timeline.

A Review of Data Sets Used

Longitudinal data evolve with time and have no shape. It is used to gain information regarding the one-to-one change from successive two time-points. The researchers utilize a variety of data for predicting the progression of Alzheimer’s disease. Nevertheless, there is a need for longitudinal data upon which machine learning can be done (Fisher et al., 2018). Longitudinal data are also incomplete (Mehdipour Ghazi et al., 2019). The studies also research on a small subset of data leading to a less accurate characterization of stages than from large data sets (Goyal et al., 2018).

The data used for research are as follows:

• Creating one’s own customized cohort such as a study selected participants using neuropsychological evaluations (Pereira et al., 2017).

• Competitions like Computer-Aided Diagnosis of Dementia (CADDementia) or Kaggle (Vieira et al., 2017).

• Collaborating with a single facility such as The Baylor Alzheimer’s Disease and Memory Disorders (Doody et al., 2010) or National Alzheimer’s Coordinating Center (Bateman et al., 2012) or from UCSF Memory and Aging Center (Zhou et al., 2012) or multiple centres e.g., Coalition Against Major Diseases (CAMD) database (Fisher et al., 2018) or The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Young et al., 2014a), (Goyal et al., 2018), (Venkatraghavan et al., 2019), (Kruthika et al., 2019b).

ADNI is the most common data set used for the longitudinal studies for tracking the disease progression (Zhang et al., 2017). Nevertheless, the data sets extracted from ADNI vary for separate studies with a varying number of patients and features. For example, a study uses longitudinal scans in cerebral cortex thickness that are 1 year apart for 132 patients with MCI (Lee et al., 2016) while other studies use T1-weighted MRI of 830 patients (Cui and Liu, 2019) or 445 patients with various stages of the disease (Wang et al., 2019). There is a shortage of a common data set for determining the stages of Alzheimer’s disease. The multi-sharing initiative and the TADPOLE challenge can provide a large enough sample size and a common data set to gain a piece of consistent information.

An Investigation of Features, Feature Engineering and Feature Selection

This section discusses the various features, feature engineering and feature selection techniques that are used by different papers. The aim is to develop a knowledge of the features which can be used to develop the model. MRI provides information about the brain tissues e.g., grey and white matter in a non-invasive manner. Studies utilize features from MRI such as brain matter and specific thinning of cortex (Aditya and Pande, 2017) or “volumes of ventricles, hippocampus, whole brain, fusiform, middle temporal gyrus, and entorhinal cortex” (Mehdipour Ghazi et al., 2019) to build models. Another study suggests using a measurement of cerebrospinal fluid, amyloid and tau, neural injury, etc. using MRI or positron emission tomography (PET) images (Nanni et al., 2016). However, there is a difference in MRI, computed tomography (CT) and PET images. For example, a white area in MRI is the subcutaneous fat while it is the skull in CT images. PET images demonstrate both the biochemical and physiological changes while MRI and CT capture anatomical changes. Hence, it is significant to be aware of these differences while using these scans. The features are not limited to colour and shape. Gabor filter remains the most common method to extract features from the medical images. Further, the quality of the medical images needs to be good to extract the features.

Transfer learning is, in addition, an effective technique which can reduce bias among equipment while producing an image. It is equally possible to have abnormalities from the same data source with various scenarios. It results in imbalanced data and the way to handle the imbalance is still an open research issue (Zhang and Sejdi´c, 2019). A study states that changes in specific nerve cells i.e., N-methyl-D-aspartate receptor results in the severity of the disease (Mishizen-Eberz et al., 2004). Another paper states that the deterioration of hippocampus and entorhinal can establish the onset of dementia. It uses data from ADNI1 and ADNI2 and informs that both data sets used MRI scanners with different field strength. Hence, it combined both data sets and affected their study (Moscoso et al., 2019). Papers also suggest the use of other features in addition to radiology imaging e.g., patient’s age, age of the patient’s parents, short term follow-up data (Bilgel and Jedynak, 2018), education and socio-economic status (Wang et al., 2019). A study uses “fractals from MRI of cerebral cortex, cortical thickness, gyrification index and ADAS-Cog test scores” to distinguish between healthy and patients with Alzheimer’s disease (Lahmiri and Shmuel, 2018). Another paper uses biomarkers such as genetics, cognitive measurements in conjunction with the cerebral cortex. The cerebral cortex inter-connects the cerebral hemisphere and longitudinal structural callosal changes extracted from MRI help to determine conversion from mild cognitive impairment (MCI) to dementia (Lee et al., 2016).

Nevertheless, it is a challenge to associate imaging features with the static feature at multiple time points. A study addresses the association of feature selection on single-task learning and multi-task learning. In single-task learning, progression is estimated separately at different time points while multi-task learning focuses on multiple related tasks. It finds that single task learning is suboptimal in predicting progression because each task is treated separately. In multi-task learning, the various tasks share a subset of features and each task is related equally. However, the learning results in sparse data (Goyal et al., 2018).

In conclusion, features from radiology imaging such as the hippocampus and cerebral cortex are good to build a model in addition to clinical data and information regarding the patient such as age, socio-economic status and cognitive score.

A Critique of Machine Learning Algorithms, Techniques and Evaluation Metrics Used and Identified Gaps

There are multiple algorithms or methods used to extract patterns with a correlation between the stages of Alzheimer’s disease and features. This section discusses some of the techniques to critique, compare and evaluate them. The aim is to gain an understanding to reproduce the techniques and identify gaps.

A study uses logistic regression and a defined common template for binary classification of a patient having the disease or not. Logistic regression is a simple, linear model and provides co-efficient weights to localize the deformations related to the disease for each voxel. The study applies both kinds of regularization, least absolute shrinkage and selection operator (LASSO) and Ridge, to handle a wide data set with more features than the number of observations (Fiot et al., 2014). Additionally, another paper uses a logistic regression model with fused LASSO regularization to predict the annual changes in callosal thickness. It states that the gender of the patient influences the accuracy of the prediction. The prediction is 84% accurate in females and 61% in males. Furthermore, the annual changes in callosal atrophy predict conversion from MCI to dementia in females more accurately than males. The use of MMSE, ADAS or Rey Auditory Verbal Learning Test (RAVLT) at baseline did not help the prediction (Lee et al., 2016). However, the study is limited to the data set and does not ensure the patient in a group of normal or MCI will not convert to dementia after the follow-up period. Survival analysis may benefit the study. Multitask exclusive learning is applied by another paper to predict markers for the disease. It utilizes information from adjacent time points to understand the intrinsic relationship among multiple cognitive measures without knowing them in advance. Least square regression with LASSO regularization is applied to the data from each time point to accurately identify image markers. It states certain biomarkers are highly associated with MMSE scores at multiple data points (Wang et al., 2019). However, the size of the participants is limited and the study does not have complete information for each patient. The model is excellent on MRI data and other types of radiology imaging e.g. PET and CSF can improve the performance.

Event-based modelling is equally suitable to understand the dynamics of progression. A study employs a discriminative approach to estimate an ordering of events for each subject and a central ordering for all subjects to create a longitudinal timeline (Venkatraghavan et al., 2019). Still, short term trajectories for imaging and non-imaging parameters after disease progression are important to consider. Another paper develops a multivariate Bayesian model and a quantitative template to compute trajectories as a function of time and measure the similarity between longitudinal biomarkers. It prepares the model after aligning short term longitudinal data and estimates the quantitative template for various stages. It can learn long-term trajectories with mean error in the onset of dementia to less than 1.5 years from short-term clinical data and known risk factors (Bilgel and Jedynak, 2018). However, it assumes a single pathway of biomarker changes and finds it difficult to generalize as each patient’s biomarkers are different.

A study uses non-image data to quantify abnormality and explore inter-feature relationships using similarity indices. It builds a reference knowledge base for normal patients and those with dementia to contrast a patient and label normal and dementia according to the affinity to each class. Multifactor affiliation analysis is used to compare the feature value of training subjects and quantify the severity of the disease (Aditya and Pande, 2017). However, the technique measures quantified distances and is suitable for numerical data only. Support vector machine (SVM) is also used to detect the disease in computer-aided-diagnosis systems. The kernel used for SVM is linear because it provides co-efficient and is simpler than non-linear SVM (Lahmiri and Shmuel, 2018). Nevertheless, SVM is difficult to modify and add spatial regularization. Another paper uses an ensemble SVM to find that it performs better than standalone SVM on five different data sets. Furthermore, certain feature selection approaches work differently for different data (Nanni et al., 2016). Although it needs to validate the findings by testing on unseen data and try new techniques for building an ensemble.

Some papers experiment with different machine learning algorithms to find the best model. For example, four algorithms (linear discriminant analysis (LDA), k-nearest neighbours algorithm (kNN), naive Bayes (NB) and second-order polynomial SVM) are used to classify normal or demented patients (Lahmiri and Shmuel, 2018) or learning algorithms such as SVM, random forest, regression and neural networks (NN) to predict the stages of the disease (Zhang and Sejdi´c, 2019).

Additionally, a multistage classifier using multiple machine learning methods e.g., NB, SVM and NN is applied to classify Alzheimer’s disease more efficiently and effectively by a study (Kruthika et al., 2019a). It uses correlation to eliminate redundant features and focuses on improving image retrieval from the smallest number of features by using transfer learning and capsule networks on MRI. Capsule networks require a limited set for training and have a lower learning curve. The main feature is the extraction of subtle changes in texture and contour of the hippocampus employing diverse techniques. It states that capsule nets provide better accuracy (82.45%) to estimate the stages than other NN architectures (Kruthika et al., 2019b). However, the prediction has more missing results than other methods and the use of more capsule layer can improve the accuracy. The fault alarm rate for the disease is equally smaller than other classes and requires advanced biomarkers and biochemical information to get better performance.

A study uses a modified Long short-term memory (LSTM) model to address the issue of disease progression models neglecting chronological dependencies of multiple biomarkers and making assumptions about patient’s trajectories. It additionally uses LDA classifier and gets Area Under the Receiver Operating Characteristics (AUROC) score of 0.90 (Mehdipour Ghazi et al., 2019). Another study uses a combination of a convolution neural network (CNN) and bidirectional gated recurrent unit (BGRU) for longitudinal analysis of MRI images. CNN learns the spatial features of MRI for classification and three cascaded BGRU are trained on the output from CNN at multiple time points for extracting longitudinal features. BGRU also handles the issue of incomplete longitudinal data through processing varying length image sequences. The architecture achieves an accuracy of 91.33% for Alzheimer’s disease vs. normal patients (NC) and 71.71% for progressive MCI subjects (pMCI) vs. stable MCI subjects (sMCI) (Cui and Liu, 2019). In addition, it finds the use of high-dimensional and longitudinal data challenging.

In summary, the area is highly researched and multiple machine learning techniques are used to model the disease progression. However, there seems to be no evidence of the application of one of a popular technique called Extreme Gradient Boosting (XGBoost) or an ensemble of classifiers including XGBoost. Some of the studies reviewed use accuracy as a metric. The data present for predicting the progression of the disease is generally imbalanced and accuracy is not a good metric for imbalanced data. There is also a lack of literature utilizing novel techniques like Shapley Additive Explanations (SHAP) to explain the output of the model in an interpretable manner. This project implements techniques to handle imbalanced data and develop ensemble learning models. It uses normalized confusion matrix, average multiclass AUROC score, multiclass AUROC score dictionary and AUROC Curve as metrics to measure the performance of the model and SHAP for explaining the output of the model.

A Review of Web-based Application and Identified Gaps

A review of the literature and web search shows that there is a huge research gap for the end-user to gain benefits from the developed models through a web-based application. Only one study is discovered that developed a web-based application for Alzheimer’s disease prevalence model for the state of Maryland, USA. The application enables the user to define the parameters and any interventions to create a burden projection for each calendar year until 2050. It also calculates the costs of the potential interventions that may either reduce or slow the progression (Colantuoni et al., 2010). There are far more web-based applications being developed for other diseases e.g. monitor patients with Parkinson’s disease remotely and support decision making for practitioners (Patel et al., 2010), (Memedi et al., 2011). The web-oriented expert system is also being used in other domains such as predicting results for hurdle races (Przednowek et al., 2018) or measurement of chemical toxicity (Alves et al., 2018).

A huge amount of research is being done to classify the disease stages. Nevertheless, none of the reviewed literature has deployed the models to the web to enable the end-user to access them easily. The project is developing a user-friendly web-based interface as a proof of concept and can be used to classify the stages of the disease based on a few parameters. It is to develop a social cause and benefit the research in the future.


The literature review identifies the gap and a need to further research the progression and answer the research question. The review also highlights a need for the development of a web-based application and hence answers the sub-research question.

The report continues here.

Leave a Reply