Data Analysis Resources Machine Learning

Classification of Alzheimer’s Disease Stages using Radiology Imaging and Longitudinal Clinical Data – Part 1

“Classification of Alzheimer’s Disease Stages using Radiology Imaging and Longitudinal Clinical Data” is the topic of my final year project as part of my MSc in Data Analytics. I am publishing the technical report in full. Please let me know if you have any questions. The report and code are available from this GitHub repository.


Alzheimer’s disease is an irreparable, degenerative disease with ongoing loss of functions of the brain. Currently, there is no medicine or treatment present to stop or slow down the progression. The identification of different stages for diagnosis requires a combination of clinical data, complex cognitive tests, radiology imaging, demographic information, time and highly skilled physicians. Recent machine learning techniques can help to provide a process to extract insights and improve the quality of life for the patients and assist the physicians. In this project, various machine learning techniques such as feature selection, feature engineering, dealing with imbalanced data, imputation of missing values and standardization are applied.
Multiple algorithms are also compared before performing random grid search to tune hyperparameters for classifiers and developing an ensemble learner to classify three clinical stages (normal, mild cognitive impairment and dementia). Cognitive tests, magnetic resonance imaging of left hippocampus and cortical thickness of right entorhinal are discovered to be important features for prediction. This finding
is similar to that reported in a number of studies. The model can equally distinguish between a class and other classes with an average area under the receiver operating characteristics score of 0.83. This is within the range of the evaluation metric of existing state-of-the-art models.

A web-based application is developed and deployed
to the cloud to address the gap for the user to benefit from the developed model. This result in an end-to-end pipeline that will empower the user with a practical application and contribute to the active research in the area.


Life expectancy among humans has increased globally through effective diagnosis and medicine. However, medicine or treatment cannot cure Alzheimer’s disease. The disease affects more than 47 million people globally (Nanni et al., 2016). Furthermore, the people affected by the disease are increasing annually and deaths attributed to the disease have been continuously increasing while deaths from other diseases have been decreasing. For example, in the U.S.A., people affected by the disease are expected to rise to 13.8 million by 2050 from 5.4 million diagnosed in 2016. The rate of diagnosis is on top going to double from 66 seconds in 2016 to 33 seconds by 2050. In 2013, 84,767 deaths were due to Alzheimer’s disease. Deaths due to chronic diseases such as stroke and heart disease decreased between 2000 and 2013 while deaths from the disease increased by 71% (Alzheimer’s Association, 2016).

In Ireland, the number of patients is expected to rise to 150,000 by 2046 from 48,000 in 2011 (O’Kelly, 2016). It equally affects the health, financial security and time spent in love and care by the family members for the patient. Alzheimer’s disease is an unrecoverable disease with ongoing damage to memory and other cognitive functions. The changes in the brain happen years before the diagnosis of dementia. The pace is also not the same among patients (Bilgel and Jedynak, 2018). Moreover, the specific cause is still unknown with only a few patients diagnosed in a correct and timely manner (Kruthika et al., 2019a). Hence, changes in biomarkers (a biological feature used to measure the absence or presence/risk of developing the disease) and cognitive markers need to be discovered at the initial stage to intervene, manage symptoms and offer effective care. The active management of the disease leads to improved quality of life for both patient and their family members. It also increases coordination among the physicians and the patient.

Motivation and Background

Currently, there is no test available to diagnose Alzheimer’s disease except for brain biopsy upon death. There is also no pharmacological treatment or medicine available which can stop or slow down the progression. A limited number of drug trials are successful because of the high cost of development of drugs and long observation time for the progression. The disease can progress rapidly but cannot be assigned to it only. The symptoms vary among patients with no clear reason as to why some people progress to advance stages. It is challenging to distinguish between age-related cognitive decline and other neurological disorders (Moscoso et al., 2019).

Alzheimer’s disease is progressive, irreversible disease and leads to loss of functions of the brain. The decline occurs because the nerve cells affecting cognitive functions are either damaged or destroyed. The person’s ability to sustain essential functions like walking, reasoning and swallowing are affected. Patients in the final stage need around the clock care and are bed-bound (Zhang et al., 2017). Examples of some of the typical symptoms are inability to perform routine tasks, unable to solve problems, confusion about time, place or relationship, poor judgement, misplacing things and unable to retrace the steps to recover the items.

Additionally, the health care costs are greater than any other disease e.g., in the U.S.A., the total cost of care amounted to $259 billion in 2017. It is also estimated that $341,651 represent the cost of care for a patient in the last five years of their life (New York State Coordinating Council, 2017). Furthermore, 18.1 billion hours of care are contributed by approximately 15 million family members and other unpaid caregivers (Alzheimer’s Association, 2016).

The techniques for diagnosis include gathering information about family and medical history, feedback from relatives or friends about the changes in skills or behaviour, physical and cognitive tests, blood tests and brain imaging. The use of cognitive tests with magnetic resonance imaging (MRI) of the brain is the most popular method to identify the deterioration of the brain (Moscoso et al., 2019). The distinct stages of Alzheimer’s disease are:

• Pre-clinical phase before the symptoms occur.

• Mild cognitive impairment (MCI) includes more cognitive decline than at the patient’s age. It does not affect routine life.

• Dementia (Alzheimer’s Association, 2016).

Research Question

Although identified 100 years ago, Alzheimer’s disease is being identified as the primary cause of dementia and a major cause of death in the last 30 years. However, researchers believe that rapid detection remains the primary key that can help in the disease being prevented, slowed or stopped (Alzheimer’s Association, 2016), (Zhang et al., 2017).

RQ:Can identification and classification of different Alzheimer’s disease stages (normal, mild cognitive impairment, dementia) using several machine learning techniques and algorithms (logistic regression, support vector machine, an ensemble of classifiers, etc.) help to improve research and support practitioners to provide early intervention and care to patients? “

Identifying changes at a developmental stage of the disease helps in managing patient’s treatment and care. It also increases coordination among the physicians and the patient and improves the quality of life through all stages of the disease for both patient and family members (Alzheimer’s Association, 2016). Further, the area is being researched significantly and still many discoveries need to be done like the precise cause of biological changes, rate of progression and how it can be stopped or slowed down. The web-based application helps access to the user to determine the stage of the disease given certain parameters.

Sub-RQ: “Can a web-based application framework help enhance the user experience for the identified stages in the disease progression?”

To solve the research question, the project implements and evaluates different machine learning techniques to find the best performing model which can classify the various stages of the disease. A model is additionally chosen to develop an open-source web-based application that can be used by anyone. The following objectives are implemented to obtain answers to the research question.

Research Objectives and Contribution

The first objective is a critical review of the literature on Alzheimer’s disease between 2004 and 2019. The goal is to recognize the problem and identify the gaps. These gaps are addressed in the two tables as the objectives of the project to contribute to the research by replicating some of the literature, implementing machine learning techniques to develop a model for classifying the various stages of the disease i.e., normal, mild cognitive impairment or dementia. The evaluation metrics for measuring the performance are normalized confusion matrix, average multiclass AUROC (Area Under the Receiver Operating Characteristics) score, multiclass AUROC dictionary and AUROC curve. Furthermore, a model is used for developing a web-based application.

The major and minor contributions resulting from this research can be summarized as:

• A generalized model that handles radiology images and longitudinal clinical data to distinguish different stages of Alzheimer’s disease i.e., normal, mild cognitive impairment and dementia.

• The model handles an imbalanced data set and missing values to improve the performance.

• Chronological dependencies are modelled using an ensemble of Extreme Gradient Boosting (XGBoost) and other classifiers. To the best of the researcher’s knowledge, this is the first time this technique has been applied using radiology images, characteristics of the patient and longitudinal clinical data.

• An end-to-end pipeline to develop a web-based application to predict the stages of the disease. It is a practical application of the process.

The scope for the project is the implementation, evaluation and presentation of the results for several machine learning techniques. The techniques are further investigated to determine the factors which contribute to the output of the model. After a thorough explanation, a model is used to build a web-based application. Certain machine learning techniques such as recurrent neural networks and convolution neural networks are out of the scope of this project because of the lack of access to the graphics processing unit (GPU).

The report continues with the literature review.

Leave a Reply