Full access

Regular Articles

Published Online: 14 January 2019

Brain MR Radiomics to Differentiate Cognitive Disorders

Sara Ranjbar, Ph.D., Stefanie N. Velgos, M.Sc., Amylou C. Dueck, Ph.D., Yonas E. Geda, M.D., M.Sc., and J. Ross Mitchell, Ph.D. [email protected] The Alzheimer's Disease Neuroimaging InitiativeAuthors Info & Affiliations

Publication: The Journal of Neuropsychiatry and Clinical Neurosciences

Volume 31, Number 3

https://doi.org/10.1176/appi.neuropsych.17120366

PDF/EPUB

Abstract

Objective:

Subtle and gradual changes occur in the brain years before cognitive impairment due to age-related neurodegenerative disorders. The authors examined the utility of hippocampal texture analysis and volumetric features extracted from brain magnetic resonance (MR) data to differentiate between three cognitive groups (cognitively normal individuals, individuals with mild cognitive impairment, and individuals with Alzheimer’s disease) and neuropsychological scores on the Clinical Dementia Rating (CDR) scale.

Methods:

Data from 173 unique patients with 3-T T₁-weighted MR images from the Alzheimer’s Disease Neuroimaging Initiative database were analyzed. A variety of texture and volumetric features were extracted from bilateral hippocampal regions and were used to perform binary classification of cognitive groups and CDR scores. The authors used diagonal quadratic discriminant analysis in a leave-one-out cross-validation scheme. Sensitivity, specificity, and area under the receiver operating characteristic curve were used to assess the performance of models.

Results:

The results show promise for hippocampal texture analysis to distinguish between no impairment and early stages of impairment. Volumetric features were more successful at differentiating between no impairment and advanced stages of impairment.

Conclusions:

MR radiomics may be a promising tool to classify various cognitive groups.

The global dementia epidemic carries a widespread emotional and financial burden on patient families, caregivers, and society (1). Currently, dementia of the Alzheimer’s type is the sixth leading cause of death in the United States, yet it is the only disease among the top 10 causes of death that cannot be prevented or cured (2). To date, clinical trials for Alzheimer’s disease therapeutics have been universally disappointing.

One significant factor for the slow progress is the lack of powerful early detection methods of cognitive impairment. Alzheimer’s disease is characterized by the deposition of beta amyloid (Aβ) and hyperphosphorylated tau, resulting in plaques and neurofibrillary tangles, respectively. One hypothetical biomarker model describes the temporal order of disease stages as follows: Aβ plaque accumulation; neuronal injury; brain structure atrophy; memory loss; and general cognitive decline (3). Clinical trials may fail because these neuropathological changes precede cognitive deficit manifestations by several decades (4–8). Consequently, irreversible brain damage may have already occurred. Thus, identifying quantifiable biomarkers for early cognitive impairment is of profound public health importance. Early detection may allow earlier pharmacological interventions when patients may be more responsive to treatments. In addition, early detection would allow patients to make conscious decisions about their situation (personal and property) if their underlying diseases lead to progression to dementia. However, as of now, early detection of cognitive impairment is challenging.

Multiple studies have used structural magnetic resonance (MR) imaging to predict Alzheimer’s disease (9–13). Several studies found that local hippocampal and total brain volume are significantly reduced in Alzheimer’s disease and mild cognitive impairment compared with healthy elderly individuals (14–23). The hippocampus is affected early, and generally severely, in the Alzheimer’s disease pathological process (24). Hippocampal volume is the most studied structural biomarker of Alzheimer’s disease and is used in the criteria for its diagnosis (25). In addition, prediction of conversion from mild cognitive impairment to Alzheimer’s disease has been correlated with the rate and amount of hippocampal, medial temporal lobe, and total brain atrophy (26–31).

Biomedical texture analysis aims to quantitatively describe pixel/voxel intensity distributions and the interrelations of pixel intensities across multiple spatial scales. Texture analysis has been used previously in the context of Alzheimer’s disease (14, 28, 32–35). Radiomics is an emerging approach to image analysis and refers to high-throughput extraction of quantitative features from radiological images in order to convert images into structured and mineable data (36–38). Radiomics pipelines often employ a variety of texture analysis methods to provide a holistic representation of texture-based information of the image or regions of interest in the image. Radiomics-based models have revealed predictive and prognostic associations between images and clinical outcomes (36–38). These models offer the potential of capturing often overlooked or hidden information on underlying disease dynamics. Our group has developed a radiomics texture analysis platform that has been used to characterize gene expression patterns of brain cancer (39, 40), to aid in the diagnosis of head and neck cancers (41, 42) and breast cancer (43).

The aim of the present study was to differentiate between three cognitive groups (cognitively normal individuals, individuals with mild cognitive impairment, and individuals with Alzheimer’s disease) and scores on the Clinical Dementia Rating (CDR) scale using MRI-based texture and volume measurements from the hippocampus. We hypothesize that changes in neuropsychological function related to cognitive impairment have a radiological counterpart, detectable via structural MRI. We also hypothesize that texture analysis will be sensitive enough to identify early MRI structural hippocampal changes related to the early Alzheimer’s disease pathophysiologic process, which will be correlated with cognitive groups and CDR scores. Specifically, our objectives are twofold: to use MR radiomics features to differentiate between cognitive groups (cognitively normal, mild cognitive impairment, Alzheimer’s disease) and to predict neuropsychological performance, quantified via CDR scores. The contributions of this study are: identification of MR-derived features that could be used in detecting early cognitive impairment; assessing the use of a granular measure of cognition assessment (such as CDR scores) compared with generic grouping for predictive modeling; and comparing the utilities of volume and texture features in this task.

Methods

ADNI Data Set

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public–private partnership with the primary goal of testing whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. We selected cases from the shared image collection ADNI-1, a 5-year study with a cohort of 200 cognitively normal individuals, 200 individuals with mild cognitive impairment, and 400 individuals with Alzheimer’s disease (44). The participants were divided into the assigned cognitively normal, mild cognitive impairment, and Alzheimer’s disease groups and underwent 3-T imaging at the following time points: baseline, 6, 12, 18 (mild cognitive impairment only), and 24 months. We categorized participants into three cognitive groups as assigned by ADNI-1: cognitively normal, mild cognitive impairment, and Alzheimer’s disease. Group specific inclusion criteria are available on ADNI’s website under the General Procedures Manual or under Study Design, Background and Rationale (45, 46). Briefly, cognitively normal participants have Mini-Mental State Exam (MMSE) scores between 24 and 30 (inclusive) and a CDR of 0, and are non-depressed, without mild cognitive impairment, and non-demented (45). Participants with mild cognitive impairment have MMSE scores between 24 and 30 (inclusive), a memory complaint, objective memory loss measured by education-adjusted scores on Wechsler Memory Scale Logical Memory II, a CDR of 0.5, absence of significant levels of impairment in other cognitive domains, essentially persevered activities of daily living, and an absence of dementia (45). Alzheimer’s disease participants have MMSE scores between 20 and 26 (inclusive), CDR of 0.5–2, abnormal memory function documented by scoring below the education-adjusted cutoff on the Logical Memory II subscale (Delayed Paragraph Recall) from the Wechsler Memory Scale, and meet the NINCDS/ADRDA criteria for probable Alzheimer’s disease (45).

Cognitive Measures

The CDR score is obtained through semi-structured interviews with patients and informants to evaluate six domains: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care (47). Patients are then classified on the following ordinal scales: 0 (no impairment), 0.5 (questionable impairment), 1 (mild dementia), 2 (moderate dementia), or 3 (severe dementia). Typically, a score of 0.5 is given to individuals with a diagnosis of mild cognitive impairment (48, 49).

Study Participants

The initial participant selection criteria were as follows: available CDR score associated with the time of image acquisition and available 3-T T₁ scanning protocol to ensure maximum resolution for the image analysis.

We found 204 unique participants in ADNI-1 with available 3-T T₁ MR images. Image data were available for all participants at different time points ranging from baseline to month 24. Because we were interested in predicting static cognition levels (CDR scores, cognitive groups), the time point was irrelevant. We selected one time point per participant to ensure unique participants across groups. To maximize group sizes, we first selected participants with a CDR score of 2, who were in the minority. These participants were excluded from all the other groups. Next, participants with CDR scores of 1 and 0.5 were selected. All the remaining participants not assigned to any groups were placed in the CDR 0 group. Individuals with a CDR score of 3 were excluded due to our small sample size. Then, we proceeded to find the 3-T MR scan time points associated with the assigned group labels for participants. The image data acquired at the selected time points were used for analysis. Thirty-one participants in total were excluded. The exclusions were due either to a mismatch between imaging and CDR score acquisition date (N=21) or image unavailability (N=10). This led to a final sample size of 173 individuals: with 67 classified as non-impaired (CDR 0), 48 with questionable cognitive impairment (CDR 0.5), 39 with mild cognitive impairment (CDR 1), and 19 with moderate cognitive impairment (CDR 2).

Demographic and clinical characteristics of the included study participants are presented in Table 1 and Table 2. It is noteworthy that to receive a diagnosis of mild cognitive impairment or Alzheimer’s disease, in addition to clinician judgment, intra-individual decline must be obtained with serial cognitive measurements (multiple CDR scores over time) or by a history of change from previously attained levels (50). Thus, the numbers of participants between cognitive grouping and CDR scores differs.

TABLE 1. Demographic and clinical characteristics of the cognitive groups^a

Characteristic	Cognitively normal (N=62)		Mild cognitive impairment (N=70)		Alzheimer’s disease (N=41)
	Mean	SD	Mean	SD	Mean	SD
Age (years)	75.2	4.7	76.0	8.4	76.1	8.7
	N	%	N	%	N	%
Sex, male	26	41.9	43	61.4^b	16	39.0

Percent indicates percentage of the specific group.

A significantly higher proportion of males were in the mild cognitive impairment group (Pearson’s χ²=5.2120, df=2, p=0.02).

TABLE 2. Demographic and clinical characteristics by Clinical Dementia Rating (CDR) scale scores^a

Characteristic	CDR 0 (N=67)		CDR 0.5 (N=48)		CDR 1 (N=39)		CDR 2 (N=19)
	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Age (years)	74.9	5.3	75.8	7.3	74.3	8.9	81.3	7.6^b
	N	%	N	%	N	%	N	%
Sex, male	29	43.3	25	52.1	22	56.4	9	47.4

Participants with a CDR score of 0 were classified as having no impairment, those with a score of 0.5 were classified as having questionable cognitive impairment or mild cognitive impairment, those with a score of 1were classified as having mild dementia or impairment, and those with a score of 2were classified as having moderate dementia or impairment. Percent indicates percentage of the specific group.

A significantly higher age (years) was observed in the CDR 2 group (p<0.0001, Student’s t-test).

Image Preprocessing

MR images can have large intensity variations when acquired from different scanners or under different acquisition parameters. ADNI performs several preprocessing steps on magnetization-prepared rapid gradient-echo (MP-RAGE) sequence images. This includes gradwarp geometry distortion correction and B1 and N3 intensity non-uniformity corrections (51) to ensure comparability of images across devices and protocols. To ascertain the comparability of images across patients, we normalized all images to have a common mean and variance in CSF (52). Texture and volume analyses were performed using the normalized images.

Texture Analysis

The imaging data were imported into the MIPAV (Medical Image Processing, Analysis, and Visualization) application version 7.2.0 (53). To avoid resampling the images, we limited the segmentation of the hippocampus to the coronal view since it provided a common pixel spacing of (1.02, 1.02) mm across all patients. Experts identified three slices with the largest possible view of the bilateral hippocampi and manually placed rectangular regions of interest (ROIs) (16×16 pixels) on the area of the hippocampi, while avoiding inclusion of areas outside the hippocampus (Figure 1A) as much as possible. This segmentation process resulted in six ROIs (3 slices×2 hippocampi) per patient. This segmentation is considered greater than two dimensional and less than three dimensional (often referred to as 2.5D), and it improves the reliability of the sampling process. The ROIs were cropped out of the images and set aside for texture analysis. The individuals who manually placed ROIs on the hippocampi were blinded to the diagnosis; another blinded individual performed quality control checks to ensure ROIs were centrally placed.

Next, we acquired mean, standard deviation, and range of voxel intensities across the ROIs. (subsequently referred to as raw intensity features). We then mapped the dynamic ranges of intensities inside the ROIs to 0–255 as a preprocessing step for characterization of texture. Several statistical and spectral texture analysis methods are included in our radiomics pipeline. Textural features describing patterns or spatial distribution of voxel intensities were calculated from second-order statistical gray level co-occurrence matrices (GLCM) (54), Laplacian of Gaussian Histogram (LoGHist) (55), rotationally invariant Discrete Orthonormal Stockwell Transform (DOST) (56), Gabor filter banks (GFB) (57), and local binary patterns (LBP) (58). These methods were implemented in Python programming language using custom-written code and open-source libraries (59, 60). In total, we extracted 119 features per ROI: three raw intensity, 26 GLCM, 10 DOST, 36 LoGHist, 12 LBP, and 32 GFB features. Extensive details on these features can be found in Ranjbar et al., Patel et al., and Ramkumar et al. (42, 43, 61) To account for sampling variability, we averaged the features over slice without losing the laterality information, leading to a total of 238 texture features (119 per hippocampus) per patient.

Volumetric Features

We used the volbrain system for computation of hippocampal volumetric measurements. Given a stack of MR images, volBrain (62, 63) automatically segments parenchyma, brain tissues, macrostructure and subcortical structures (shown in Figure 1B) and reports volumetric measurements of the structures. For this study, we used two volumetric features for the hippocampus area including relative volume (%) and asymmetry index (%). Relative volume represents the sum of the hippocampi volumes in relation to the volume of the intracranial cavity. The asymmetry index is the difference between right and left volumes divided by their mean.

Statistical Analysis and Machine Learning

Age and sex differences between groups were tested using Student’s t-test and Pearson’s chi-square test, respectively. Statistical significance was defined as a p value <0.05. We performed univariate analysis to compare the difference in texture and volume feature values for both CDR groups and cognitive groups. The p values were adjusted for multiple comparisons using the Benjamini and Hochberg false discovery rate method (64).

We applied principal component analysis (PCA) to reduce dimensionality of texture features (65). To maintain interpretability of the principal components, PCA was applied to features stemming from a common texture analysis method. Several comparative datasets were generated with PCA to find the optimal level of variance. The final set of PCs represented 90% of the variance in the original features. Texture PCs combined with volume features were used in supervised classification of two label variables: cognitive groups (cognitively normal, mild cognitive impairment, Alzheimer’s disease) and CDR scores.

Machine learning was conducted using the open-source Python-based package scikit-learn (66) and custom-written scripts. We used a leave-one-out cross-validation (LOOCV) scheme to predict the labels (65) and to select features for training. LOOCV iteratively uses all samples except one for model training. In each round, the left-out sample serves as the test case to assess the generalizability of the trained model on an unseen case. In each round, a trained model was generated using features selected by Sequential Forward Feature Selection (SFFS) (65) and an internal cross-validation (CV). Starting from an empty set, SFFS sequentially added features as long as their addition resulted in CV accuracy improvement of 5%. We used diagonal quadratic discriminant analysis (DQDA) as the classification method (65). DQDA is a naïve Bayes classifier that allows for diagonal class covariance matrices and has shown to be successful in classification tasks of high-dimensional data with small sample sizes (67). Several studies have shown that DQDA has comparable or better performance than support vector machine in classification of high-dimensional data (68, 69).

Our data, by its nature, contained class imbalance, in which dominance of the majority class can hinder the classifier’s ability to learn the inherent properties of each class. To ensure generalizability of the result in experiments with substantial class imbalance, we used an ensemble down-sampling approach coupled with the above-mentioned learning scheme. In each CV round the training samples were divided into majority and minority groups. The majority group was then randomly divided into subsets roughly the same size as the minority group. Each of the subsets was merged with the minority group and served as the training set. The average probability across models for the test sample was used as the probability for that sample. This iterative process allowed every sample in the data set to serve as the left-out sample once.

The area under the receiver operating characteristic curve, sensitivity, and specificity were used to assess classification performance using the open-source software packages R (2.7) (70) and Scipy (0.15.1, Python 2.7) (71). The method of DeLong et al. and the pRoc package (72) were used to estimate the receiver operating characteristic (ROC) curve significance, p values, and 95% confidence intervals (73). The significance level (p<0.05) is the probability that the observed sample area under the ROC curve is significantly different from the null hypothesis (area=0.5) and is evidence that the model does have an ability to distinguish between the two groups.

Results

The mild cognitive impairment group had a higher proportion of males than the cognitively normal and Alzheimer’s disease groups (Pearson’s χ²=5.2120, df=2, p=0.02). No significant difference was observed in sex ratio of the other groups. Including sex in models with texture did not impact results. As expected, the age of participants in the CDR 2 group was significantly higher than other CDR levels. Including age in models with volume did not impact results. Figure 2 compares volume features across groups and CDR scores. Figure 3 shows the univariate comparison of features across feature groups. Features extracted from left and right hippocampi showed similar significance levels. Increasing the level of variance included in the principal components of texture features did not improve the results.

FIGURE 2. Comparison volume features across cognitive groups and Clinical Dementia Rating (CDR) scale scores^a
^a The plots show the distribution of the two volume features (y-axis) across different groups of participants: cognitive groups and CDR scores (x-axis). Percent volume shows the sum of hippocampal volumes in relation to the volume of intracranial cavity. The asymmetry index shows the difference between the right and left hippocampal volumes divided by their mean. AD=Alzheimer’s disease, CN=cognitively normal, MCI=mild cognitive impairment.

FIGURE 3. Radiomic features that differentiate Clinical Dementia Rating (CDR) scale scores and cognitive groups^a
^a Dependent variables are listed above columns (CDR score and cognitive group). Data were separated into different combinations of binary scores for each dependent variable, and univariate analysis was performed. Color maps show the false discovery rate-corrected p values of a two-sample t test within the data set of each classification problem. Red to white colors indicates significant (low) p values. A lower p value indicates a better ability to differentiate the pair of dependent variables in the column title. Dost=discrete Orthonormal Stockwell transform, Gabor=Gabor filter banks, GLCM-gray-level co-occurrence matrices, HC=hippocampus, LBP=local binary patterns, LoGHist=Laplacian of Gaussian histograms.

Prediction of Cognitive Groups

The area under the ROC curves (AUCs) for the classification of cognitive groups is shown in Figure 4A. Classification reached AUC levels of 0.89 (CI=0.82–0.94) for cognitively normal compared with Alzheimer’s disease; 0.86 (CI=0.79–0.91) for cognitively normal compared with mild cognitive impairment; and 0.70 (CI=0.61–0.77) for mild cognitive impairment compared with Alzheimer’s disease. The performance measures, selected features, and ROC curve analysis for the cognitive groups are summarized in Table 3. All three models were significant at a p value ≤0.05. Including sex in the models did not affect the results.

FIGURE 4. Comparison of area under the receiver operating characteristic (ROC) curves^a
^a Panel A shows the area under the ROC curves of cognitive group classification models. Panel B shows the area under the ROC of the Clinical Dementia Rating (CDR) scale score classification models. Error bars show the confidence interval of the areas under the curve. AD=Alzheimer’s disease, CN=cognitively normal, MCI=mild cognitive impairment. Bars with the same color in panels A and B are associated with classification experiments that differentiate similar levels of cognitive impairment. Dark bars=no impairment versus mild-moderate impairment (CN–AD [left]; CDR 0–2 and 0–1 [right]); medium bars=no impairment versus questionable impairment (CN–MCI [left] or no impairment/questionable impairment versus mild impairment (CDR 0–0.5 and 0.5–1 [right]); light bars=questionable impairment to moderate impairment (MCI–AD [left]) or questionable and mild impairment versus moderate impairment (CDR 0.5–2 and 1–2 [right]).

TABLE 3. Classification results for prediction of cognitive groups^a

Cognitive group	Area under the curve	Sensitivity	Specificity	Feature type	Feature	Standard error^b	95% CI	Z statistic	p^c
CN compared with MCI	0.86	0.79	0.83	Texture	Left HC LoGHist pc 1; Right HC LBP pc 1	0.03	0.79, 0.91	11.58	<0.0001
MCI compared with AD	0.70	0.54	0.83	Texture	Left HC LBP pc 1	0.05	0.61, 0.77	4.16	<0.0001
CN compared with AD	0.89	0.82	0.87	Volume	% HC Volume	0.03	0.82, 0.94	12.31	<0.0001

AD=Alzheimer’s disease, CN=cognitively normal, HC=hippocampus, LBP=local binary patterns, LoGHist=Laplacian of Gaussian histograms, MCI=mild cognitive impairment, PC=principal component, %volume=relative volume in percent.

For further details, see DeLong et al. (73)

The significance level (p<0.05) is the probability that the observed sample area under the receiver operating characteristic curve is significantly different from the null hypothesis (area=0.5).

Prediction of CDR Scores

The AUCs for the classification of CDR scores is shown in Figure 4B. The AUC levels of our models were: 0.98 (CI=0.93–0.99) for CDRs 0–2; 0.95(CI=0.9–0.98) for CDRs 0–1; 0.84 (CI=0.76–0.89) for CDRs 0–0.5; 0.73 (CI=0.61–0.83) for CDRs 0.5–2; 0.71 (CI=0.61–0.8) for CDRs 0.5–1; and 0.56 (CI=0.42–0.69) for CDRs 1–2. Overall, models were more successful in classification when the target groups were farther apart on the CDR spectrum. Details of the models’ performance and significance, selected features, and ROC curve statistics for this analysis are present in Table 4. All models were significant at a p value ≤0.05 except for the classification model CDR 1–2. Relative volume of hippocampi (percent volume) was a predictive feature in two of the six models. We conducted further analysis to assess whether age accounted for the significance of percent volume. When age was included in the model, percent volume remained highly statistically significant (p=0.003), while age was not significant (p=0.35). The AUC only slightly increased from 0.98 (model with percent volume alone) to 0.9910 (model with percent volume and age). A model containing age by itself resulted in an AUC of only 0.785, and the addition of percent volume significantly improved the model fit (p<0.0001). Thus, we conclude that percent volume is meaningful in differentiating between CDR 0 and 2, independent of age.

TABLE 4. Classification results for prediction of the Clinical Dementia Rating (CDR) scale score^a

CDR pairs^b	Area under the curve	Sensitivity	Specificity	Feature type	Feature	Standard error^c	95% CI	Z statistic	p^d
0, 0.5	0.84	0.78	0.81	Volume	% HC Volume	0.04	0.76, 0.89	9.67	<0.0001
0.5, 1	0.71	0.77	0.67	Texture	Right HC Dost pc2	0.05	0.61, 0.8	4.03	0.0001
1, 2	0.56	0.58	0.59	Texture	Left HC Gabor pc 1	0.08	0.42, 0.69	0.74	0.46
0, 1	0.95	0.88	0.96	Texture	Left HC Dost pc1, Left HC LoGHist pc5, Left HC Gabor pc1, Right HC GLCM pc2	0.02	0.9, 0.98	22.88	<0.0001
0.5, 2	0.73	0.58	0.90	Texture	Left HC Gabor pc1, Left HC Dost pc1	0.08	0.61, 0.83	2.89	0.0038
0, 2	0.98	1.0	0.90	Volume	% HC Volume	0.01	0.93, 0.99	46.5	<0.0001

Gabor=Gabor filter banks, Dost=discrete orthonormal Stockwell transform, GLCM=gray-level concurrence matrices, HC=hippocampus, LBP=local binary patterns, LoGHist=Laplacian of Gaussian histograms; PC=principal component, % volume=relative volume in percent.

Participants with a CDR score of 0 were classified as having no impairment, those with a score of 0.5 were classified as having questionable cognitive impairment or mild cognitive impairment, those with a score of 1 were classified as having mild dementia or impairment, and those with a score of 2 were classified as having moderate dementia or impairment.

For further details, see DeLong et al. (73)

The significance level (p<0.05) is the probability that the observed sample area under the receiver operating characteristic curve is significantly different from the null hypothesis (area=0.5).

Discussion

The well-established MR volume features and radiomics texture features had comparable and complimentary utility in classifying cognitive groups and CDR categories. There is ample literature on the utility of imaging features extracted from MRI to assist in clinical diagnosis of probable Alzheimer’s disease. Several investigations have focused on using volume, shape, and other structural MR features in identifying cognitively normal, mild cognitive impairment, and Alzheimer’s disease groups (10, 13, 18, 26, 28, 30, 74–78). Texture features have also been used in identifying Alzheimer’s disease (14, 28, 32–35, 79). The literature is controversial about exactly what texture captures in the context of Alzheimer’s disease. Sørensen et al. (14) speculated that texture patterns may provide information on hippocampal function as a result of the significant correlation with [18F]fluorodeoxyglucose-positron emission tomography uptake. The same group also found that hippocampal texture, followed by hippocampal volume, were the most significant features in their algorithm to discriminate cognitive groups (35).

Our results are consistent with those of Sørensen et al. (14) For example, when they used only volume to discriminate between ADNI cognitively normal individuals and those with Alzheimer’s disease, they achieved an AUC of 0.91. In our case, we achieved an AUC of 0.89 on this task. Sørensen et al. (14) also used texture features to differentiate cognitively normal individuals from those with mild cognitive impairment with an AUC of 0.76, comparable to our AUC of 0.86 for the same task.

One technical difference between our methods and those of Sørensen et al. (14) is that Sørensen et al. resampled MR images in order to have consistency in image voxel size across their cohort. Resampling is often a necessary preprocessing step when images are obtained using different imaging protocols or devices. However, resampling involves interpolation, which can affect the spatial frequency content of the image. In order to establish a reliable baseline for the utility of texture features, we focused on images with a common voxel size in this study. We also used 3-T imaging for higher spatial resolution and contrast-to-noise ratios. Another difference between our work and that conducted by Sørensen et al. is that we used texture features to predict CDR scores. We were able to distinguish CDR 0 (no impairment) from 1 (mild dementia) with an AUC of 0.95. This model used a variety of texture features but not hippocampal volume. On the other hand, volume features alone were able to distinguish CDR 0 from 0.5 (questionable impairment) with an AUC of 0.84. They also were able to distinguish CDR 0 from 2 (moderate dementia) with an AUC of 0.98. Overall, our CDR models performed well at distinguishing cognitively normal people from those with early-stage or questionable cognitive impairment.

Distinguishing between CDR 1 and 2 was the most difficult task in our study, and AUC classification performance was poor, not achieving statistical significance (p=0.46). The transition from mild to moderate impairment appears to be a subtle shift without pronounced discernable changes in texture or hippocampal volume. While texture features suggest that CDR scores and neuropathology may have a relationship early in cognitive impairment (that is, early deposition of amyloid or tau), the lack of discrimination accuracy between CDR 1 and 2 suggests that the pathological depositions may not help in improving classification accuracy. Aisen et al. (80) posited that the terminology behind mild and moderate Alzheimer’s disease is inaccurate, because the individual has had the disease present for many years. The clinical staging nomenclature infers a clear distinction between various stages, but in reality, the process progresses in a more continuous manner (80).

As a result of technical limitations of our pipeline, we did not perform three-dimensional segmentation of the hippocampi. Instead, we used a 2.5D segmentation approach in which the hippocampi were segmented on several two-dimensional slices to increase texture sampling. In this approach, we manually placed two-dimensional ROIs on three slices with the largest cross-sectional view of the hippocampus (16×16 pixels). We acknowledge that extracted ROIs may have potentially included immediate anatomical structures such as the entorhinal cortex, resulting in mixed captured signals. In future studies, we plan to replicate the study using an automatic segmentation process.

Small sample size is another limitation of this study (N=173). When divided between CDR groups, each dataset consisted of few samples with a high-dimensional feature space, two known contributors to model overfitting. Due to the lack of sufficient sample size, we did not split the dataset into train and test sets. In order to provide a realistic estimation of model performance and avoid overfitting, we adopted a nested CV scheme for model training and validation and a rather conservative threshold for feature selection (minimum of 5% CV accuracy improvement). Given that our results are comparable to previous studies, we feel confident that the risk of overfitting was mitigated and that the results presented here are generalizable to external data. In the future, we aim to validate this result on larger external datasets. Lastly, the reader should note that we cannot claim the clinical utility of textural biomarkers introduced here since the models were not tested prospectively.

Conclusions

We used existing resources (ADNI-1 data) to introduce a new application of brain MR radiomics using texture analysis and volumetric features in the field of aging, neuropsychiatry, and dementia. Our study findings support the use of brain MR radiomics features for identifying early cognitive impairment, as many features are sensitive to early Alzheimer’s disease pathology. Future studies need to replicate these findings and should examine the clinical utility of MR texture features as Alzheimer’s disease biomarkers. Beyond volume and texture analysis of T₁ images of the hippocampus, future applications should expand to incorporate additional data sources. These could include additional MRI contrasts (for example, diffusion tensor imaging), fMRI, and PET. Additional brain structures known to be involved in Alzheimer’s disease progression could also be investigated.

Footnotes

Dr. Ranjbar and Ms. Velgos contributed equally to this study.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this article. A complete listing of ADNI investigators is available online (http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf).

The contents of this article are solely the responsibility of the authors and do not necessarily represent the official view of NIH.

References

Brookmeyer R, Johnson E, Ziegler-Graham K, et al: Forecasting the global burden of Alzheimer’s disease. Alzheimers Dement 2007; 3:186–191

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

Abstract

Objective:

Methods:

Results:

Conclusions:

Methods

ADNI Data Set

Cognitive Measures

Study Participants

Image Preprocessing

Texture Analysis

Volumetric Features

Statistical Analysis and Machine Learning

Results

Prediction of Cognitive Groups

Prediction of CDR Scores

Discussion

Conclusions

Footnotes

References

Information

Published In

History

Keywords

Authors

Details

Notes

Competing Interests

Funding Information

Metrics

Citations

Export Citations

View options

PDF/EPUB

Login options

Purchase Options

Not a subscriber?

Figures

Other

Share

Share article link

Share