If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Quality assessment, variability and reproducibility of anatomical measurements derived from T1-weighted brain imaging: The RIN–Neuroimaging Network case study
Neuroradiology Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, ItalyMOE Key Laboratory for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
Neuroradiology Unit, IRCCS Mondino Foundation, Pavia, ItalyDepartment of Brain and Behavioral Sciences, University of Pavia, Pavia, ItalyNMR Research Unit, Department of Neuroinflammation, Queen Square MS Centre, UCL Queen Square, Institute of Neurology, Faculty of Brain Sciences, University College London, London, United Kingdom
Reproducibility of harmonized T1-weighted protocol at 3 T was tested in a multicentric study.
•
The study measured a global variability that ranges from 11% to 19% for subcortical volumes.
•
Moreover, we measured a global variability that ranges from 3% to 10% for cortical thicknesses.
•
The Bland-Altman analysis on traveling brain measures did not detect systematic scanner biases.
•
SVM can classify the scanner vendor from brain measures with an accuracy = 0.60 ± 0.14 (chance 0.33).
Abstract
Initiatives for the collection of harmonized MRI datasets are growing continuously, opening questions on the reliability of results obtained in multi-site contexts.
Here we present the assessment of the brain anatomical variability of MRI-derived measurements obtained from T1-weighted images, acquired according to the Standard Operating Procedures, promoted by the RIN-Neuroimaging Network. A multicentric dataset composed of 77 brain T1w acquisitions of young healthy volunteers (mean age = 29.7 ± 5.0 years), collected in 15 sites with MRI scanners of three different vendors, was considered. Parallelly, a dataset of 7 “traveling” subjects, each undergoing three acquisitions with scanners from different vendors, was also used. Intra-site, intra-vendor, and inter-site variabilities were evaluated in terms of the percentage standard deviation of volumetric and cortical thickness measures. Image quality metrics such as contrast-to-noise and signal-to-noise ratio in gray and white matter were also assessed for all sites and vendors.
The results showed a measured global variability that ranges from 11% to 19% for subcortical volumes and from 3% to 10% for cortical thicknesses. Univariate distributions of the normalized volumes of subcortical regions, as well as the distributions of the thickness of cortical parcels appeared to be significantly different among sites in 8 subcortical (out of 17) and 21 cortical (out of 68) regions of i nterest in the multicentric study.
The Bland-Altman analysis on “traveling” brain measurements did not detect systematic scanner biases even though a multivariate classification approach was able to classify the scanner vendor from brain measures with an accuracy of 0.60 ± 0.14 (chance level 0.33).
In the last decades, non-invasive anatomical measurements derived from Magnetic Resonance Imaging (MRI) of the brain played a pivotal role in the assessment of many diseases such as neurodevelopmental, neurodegenerative, psychiatric and rare conditions. Many of these measurements already demonstrated to be well-suited neuroimaging anatomical biomarkers for the early diagnosis and assessment of Alzheimer’s Disease [
The impact of automated hippocampal volumetry on diagnostic confidence in patients with suspected Alzheimer’s disease: A European Alzheimer’s Disease Consortium study.
]. In psychiatric and neurodevelopmental disorders, brain anatomical measurements have been shown either relevant or at least promising in the study of many diseases such as schizophrenia [
However, there are still many challenges to be tackled for the advances in the detection of structural brain biomarkers. There is a strong need for large sample sizes to provide sufficient statistical power for the investigation of groups and subgroups and to deal with relatively small pathology effect size, hence multi centric studies are more and more necessary for the development of both pharmacological and non-pharmacological interventions.
In this context, in most cases there are unclear recommendations for MRI image acquisition and analysis details for multivendor protocols using the standard equipment available in hospitals. Moreover, there are no clear quality control guidelines and reference values of different markers and brain measurements extracted from T1-weighted imaging together with unclear recommendations for retrospective harmonization of already existing data acquired with different protocols [
Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
]. These factors hinder the advances in the field since they represent sources of variability which, in addition to the heterogeneity of the population under exam, often hamper the detection of subtle pathological changes, even in the context of the recent development of advanced and powerful artificial intelligence techniques [
For these reasons many initiatives were promoted for the harmonization of MRI acquisitions protocols and data analyses such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (http://adni.loni.usc.edu/) [
With the same aim, the RIN - Neuroimaging Network, an Italian national consortium dedicated to share large-scale multimodal quantitative MRI datasets, promoted the development of guidelines for the data acquisition and processing [
]. In this study, we present the results obtained on brain structural MRI measures. In particular, we aimed to measure the anatomical variability of different brain structures, taking into account the influence on these measures of scanner vendor along with different hardware solutions used for data acquisition. In addition, we explored the variability of some image quality metrics which may have indirect effects on the image-derived anatomical measurements.
2. Material and methods
2.1 Description of the datasets
Data were acquired in fifteen sites of the RIN – Neuroimaging Network, equipped with 3 T MRI scanners from three different vendors (Philips Healthcare, GE Healthcare, and Siemens Healthineers). Two distinct studies were conducted. The first considered data acquired in 14 sites (multicentric study); the second considered data acquired on a small number of subjects that repeated the acquisition on three sites selected on the basis of scanner vendor and geographical area (traveling brain study).
2.2 Multicentric study
In order to assess the image quality and the anatomical variability in a multicentric framework of cerebral measurements, derived from T1-weighted brain MRI, a dataset composed of 77 brain acquisition obtained in as many young healthy volunteers (45F/32 M, mean age = 29.7 ± 5.0 years, range [21–45] years) was considered. In particular, we collected 29 datasets from vendor 1 (mean age = 30.6 ± 5.4 years, 18F/11 M), 18 from vendor 2 (mean age = 30.4 ± 4.6 years, 10F/8M), and 30 from vendor 3 (mean age = 28.4 ± 4.8 years, 17F/13 M). Details on the subjects recruited in each participant center are reported in Table 1, along with the hardware information of the MR scanner (vendor code and number of channels of the receiving coils).
Table 1Details on the technical characteristics of scanners (vendor, number of channels of the receiving head coils) of each site. Both site and scanner vendor were anonymized using a numerical code. For the multicentric study, the number of the subjects recruited at each site and their demographical data are reported. For the traveling brain study, columns indicate the geographical area of each site (North Area, AN, and South Area, AS), as well as the corresponding number of acquired subjects.
The inter scanner variability was performed with an additional dataset composed of 7 healthy traveling subjects who underwent three brain MRI acquisitions at three sites equipped with scanners from different vendors.
Two geographic areas (North Area, AN, and South Area, AS) were defined in Italy. Among the 7 traveling subjects, 4 subjects (mean age = 31.5 ± 2.2 years, 2F/2M) performed the T1-weighted MRI acquisitions in area AN, at sites 1, 8, and 10; the remaining 3 subjects (mean age = 28.4 ± 11.0 years, 2F/1M) were acquired in area AS, at sites 5, 9, and 15 (Table 1).
2.4 MRI brain imaging protocol
One of the main objectives of the RIN - Neuroimaging Network was the development of Standard Operating Procedures (SOPs) for the acquisition of a comprehensive MRI protocol for the brain. The complete set of scanning parameters for the acquisition of T1-weighted MRI imaging is reported in Table 2. The datasets, both for the multicentric study and the traveling brain study, were acquired according to the agreed SOPs.
Table 2Parameters of acquisition for T1-weighted MRI, differentiated for each scanner vendor, as reported in the SOPs developed by RIN – Neuroimaging Network.
Vendor
PHILIPS
GE
SIEMENS
Sequence type
3D FFE
3D FSPGR BRAVO
MP-RAGE
Slice orientation
sagittal
sagittal
sagittal
FOV [mm]
240 × 240
256 × 256
256 × 256
Resolution [mm3]
1 × 1 × 1
1 × 1 × 1
1 × 1 × 1
Matrix (Base Resolution)
240 x 240
256 x 256
256 x 256
Slice thickness
1
1
1
Slice gap (mm)
–
–
–
Number of slices
175 – 180
175 – 180
175 – 180
Phase Encoding direction
AP
AP
AP
Slice order
Interleaved
Interleaved
Interleaved
NSA/Averages/NEX
1
1
1
TR [ms]
2300
not modifiable
2300
TE [ms]
2.96
3.2
2.96
TI [ms]
900
900
900
Flip angle
9°
9°
9°
Fat Suppression
No
No
No
k-space coverage (Halfscan/Partial Fourier)
No
No
No
Acceleration factor
SENSE ≤ 2.3
ARC = 2
GRAPPA = 2
Filter
CLEAR on
PURE on
Prescan Normalize on
Bandwidth (Hz/pixel)
191
122
240
Duration
≈ 5 min 30 sec
≈ 5 min 30 sec
≈ 5 min 30 sec
FFE = Fast Field Echo; FSPGR = Fast SPoiled GRadient echo; BRAVO = BRAin VOlume imaging; MPRAGE = Magnetization Prepared Rapid Gradient Echo.
]). Firstly, we converted 3D T1-weighted MR brain images from DICOM format to NIfTI format. Secondly, we used the FS pre-processing workflow, known as recon-all analysis pipeline, which processes the input structural MRI scan across several FS functions performing all cortical reconstruction through 31 processing steps. In order to carry out gray matter tissue segmentation, FS takes advantage of a lot of information such as image intensities, global position within the brain and relative position to neighboring brain regions. Based on this information, it uses a probabilistic atlas in which coordinates have anatomical meaning and a Markov Random Field (MRF) model is used to find local spatial relationships between labeled structures. FreeSurfer implements a model based on a mixture of a small number of Gaussians for each structure for each point in the space and a maximum posterior estimate of the model parameters to assign one of the Region of Interest (ROI) labels to each voxel. From the FS segmentations results, we extracted the volumes (mm3) of the subcortical gray matter structures and the thicknesses (mm) of the cortical regions. Along with the brain structure, FS was used to measure the total intra-cranial volume, which is a well-established measurement for volume normalization across subjects [
]. In Fig. 1A an example of brain structural T1-weighted images is shown together with the overlay of FreeSurfer segmentation results (in false color) of subcortical and cortical gray matter structures (Fig. 1B).
Fig. 1A. Sagittal, coronal and axial view of a raw T1-weighted of a 3-D image of a representative subject of the dataset. B. overlay of FreeSurfer segmentation results (in false color) of subcortical and cortical gray matter structures.
] (CNR): the CNR evaluates how separated the distributions of signal intensity of adjacent tissues are. CNR indicates specifically the contrast between GM and WM are. Higher values indicate a better gray matter structure definition with respect to the surrounding areas. Additionally, the contrast-to-noise ratio was evaluated between GM and CSF (CNRGMCSF) in order to investigate the impact of this different contrast on the segmentation of GM structures surrounded by CSF.
- Signal-to-Noise Ratio (SNR): the SNR evaluates how much the signal intensity in a specific region is significant with respect to the noise fluctuations. It is calculated as the ratio between the mean intensity of the considered tissue and its standard deviation in the same region.
] (EFC): the EFC uses the Shannon entropy of voxel intensities as an indication of ghosting and blurring. Lower values indicate less artifacts and better image quality.
- Coefficient of Joint Variation (CJV): the CJV of gray (GM) and white matter (WM) was proposed for the evaluation of intensity non-uniformity. Higher values indicate worse image quality due to the presence of heavy head motion and large intensity non-uniformity artifacts.
3.3 Variability assessment
The variability of anatomical measurements of cortical and subcortical regions was assessed through the standard deviation of the measures on the whole multicentric data set, in terms of percentage with respect to the corresponding mean value. Moreover, the minimum and the maximum of the standard deviation were calculated for intra-site, inter-site and intra-vendor scenarios.
The distributions of quality control measurements were also calculated separately for different vendors, for different scanner models and number of elements of the receiving RF coils.
3.4 Statistical analyses
The comparisons of the mean values of the extracted measurements among the participant centers were performed with an ANOVA test and the statistical significance threshold of p-value = 0.01 was set (both uncorrected and with False Discovery Rate correction).
For the traveling brain study, Bland-Altman plots were considered to evaluate the agreement between the extracted measures with a different approach and to assess the variability at both subject and traveling brain cohort level. In order to assess potential biases limited to specific regions, paired t-tests were performed for each anatomical measurement for every couple of vendors under analysis.
The segmentation of the images, the quality control and the statistical analyses were performed at a single site, under the same operating system in order to avoid additional sources of variability [
3.5 Machine learning experiments on the multicentric data
In order to assess the residual dependency of the brain anatomical measurements on the acquisition characteristics after the application of the SOPs, a simple Support Vector Machine (SVM) classifier [
] was trained on the anatomical measures extracted by the segmentation algorithm to recognize the Vendor (Vendor 1, Vendor 2, Vendor 3) that manufactured the scanner.
The same classification problem was tackled by feeding a linear SVM classifier with the quality metrics (CJV, CNR, SNR_WM, SNR_GM, EFC) extracted through the MRIQC method.
In both the scenarios, the training and testing procedure was performed through a cross validation approach (8-fold cross validation). The classification performance was assessed by measuring the mean accuracy and the standard deviation of the accuracy on the 8 validation folds.
4. Results
Fig. 2 reports the images obtained with all vendors, from two subjects of the travelling brain study, one acquired in the North area (AN, left panel) and one in the South area (AS, right panel).
Fig. 2T1w images acquired with the three vendors on the same two subjects: one subject in the North area, AN (left column) and one subject in the South area (AS, right column). The 3D images were registered to the MNI-152 (Montreal Neurological Institute) template and intensity rescaled between 0 and 1 by using as a min reference the 1st intensity percentile and as a max reference the 99th intensity percentile.
Fig. 3 shows two examples of distributions of volume and thickness measurements obtained across sites and vendors. Left and right hippocampal values (normalized to the total intracranial volume) are reported together with the thicknesses of the left and right precuneus cortices. The chosen structures are particularly suited for the study of neurodegeneration and aging, since they are strongly involved in cognition and memory.
Fig. 3Examples of box plots of anatomical measurement distributions across the different sites (left panels, site numerical codes on the x-axis) and the different vendors (right panels, vendor numerical codes on the x-axis). In the first row, the volumes of the left and right hippocampus (normalized to the total intracranial volume (TIV)) are reported. In the second row, the thicknesses of the left and right precuneus cortices are shown. The bottom and top edges of each box indicate the 25th and 75th percentile of the measure distribution respectively, and the central line indicates the median. Color code: blue for Vendor 1 (V1), orange for V2, and green for V3. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
For a more exhaustive description of the results, the values of volume variability of all the segmented subcortical structures on the multicentric dataset are reported in Table 3. It reports, for each structure, the minimum, the maximum, and the mean of intra-site percentage variability, to assess the range of volume variation within a single site scenario. The variations were evaluated on volume values normalized to the total intra-cranial volume (TIV). Analogously, the intra-vendor and inter-site variabilities are reported. In addition, the global inter-site mean volume values and statistical significance of the ANOVA test on the compatibility of inter-site sampling are shown. The intra-site minimum variation ranges from 1.91% to 10.31%. The maximum ranges from 15.79% to 27.93% (global mean 11.36%). The intra-vendor variability calculated on the three separate datasets ranges from 5.44% (V2) to 17.70% (V3). Considering the average across all the subcortical areas, the mean variabilities among sites of the same vendor are 14.69% for V1, 8.72% for V2, 9.74% for V3; the latter values are comparable to the inter-site variability calculated on the entire dataset which ranges from 11.4% to 19.13%, with an average of 13.84%.
Table 3Anatomical variabilities of the measurements of the volume of subcortical structures. Intra-site analysis: minimum, maximum and mean standard deviations calculated on the 14 sites. Intra-vendor analysis: mean standard deviations calculated on datasets from sites of the same vendor (V1, V2, V3 as in Table 1). Inter-site analysis: global standard deviation and mean of volume measurements, and statistical significance of the ANOVA test on the mean compatibility across sites (*pvalue < 0.01). The variations were evaluated on volumes normalized to the Total Intracranial volume (TIV).
In all the considered cases the highest variability is found for the values of the nuclei accumbens which are small and difficult structures to be segmented by an automated tool.
Similarly, Table 4 reports the measured cortical thickness variations across brain cortex parcels. The intra-site minimum variation ranges from 0.3% to 5.0%. The maximum ranges from 5.0% to 15.5% (global mean 5.1%). The intra-vendor variability of brain cortex thickness ranges from 3.1% (V1) to 12.2% (V2). Considering the average across all the cortical areas, the mean variabilities among sites of the same vendor are 5.3% for V1, 5.5% for V2, 5.4% for V3; the latter values are comparable to the inter-site variability of brain cortex thickness calculated on the entire dataset ranges from 3.3% to 10.4%, with an average of 5.7%.
Table 4Anatomical variabilities of the measurements of the thicknesses of cortical structures: Intra-site analysis: minimum, maximum, and mean standard deviations calculated on the 14 sites. Intra-vendor analysis: mean standard deviations calculated on datasets from sites of the same vendor (V1, V2, V3 as in Table 1). Inter-site analysis: global standard deviation and mean of thickness measurements and statistical significance of the ANOVA test on the mean compatibility across sites (*pvalue < 0.01).
The distributions of the normalized volumes of subcortical regions, as well as the distributions of the thickness of cortical parcels appeared to be significantly different among sites in 8 subcortical (out of 17) and 21 cortical (out of 68) ROIs.
4.2 Quality control measurements
In Fig. 4 the distributions across sites of each quality control metric are reported. For all of them, the intra-site distributions of the values appeared to be peculiar for each considered site, with a variability that is very different with respect to the global one.
Fig. 4Box plots of quality control metrics distributions across sites (left panels, site codes on the x-axis) and vendors (right panels, vendor codes on the x-axis): (A) the contrast-to-noise-ratio (CNR), (B) the contrast-to-noise-ratio between GM and CSF (CNRGMCSF) (C) the signal-to-noise ratio for gray (SNRGM) and (D) white matter (SNRWM), (E) the entropy focus criterion (EFC) and (F) the coefficient of joint variation (CJV), are reported. Color code as inFig. 2(blue for Vendor 1 (V1), orange for V2, and green for V3). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
In order to disentangle the contributions to the quality metrics variability, the contrast-to-noise ratio and the signal to noise ratio in brain gray matter were aggregated not only by scanner vendor, but also by scanner model and number of channels of the head coils (Fig. 5). The distributions were strongly dependent on the vendor, while neither the specific scanner model nor the number of elements of the head coils seemed to have a significant impact on the considered metrics.
Fig. 5Distributions of Contrast-to-noise ratio (CNR) and signal-to-noise ratio in gray matter (SNRGM) distributions on data aggregated by vendor and scanner model: 3 models for V1, 2 models for V2 and 4 models for V3 The boxplot colors correspond to the number of channels of the head coils.
The linear SVM classifier, trained on the anatomical measurements extracted by the segmentation algorithm on the multicentric study, was able to classify the scanner vendor with an average accuracy of 0.60 ± 0.14. This value should be compared with the chance level, which is equal to 0.33 for a three-class classification. A similar SVM classifier, trained on the quality control metric extracted through the MRIQC protocol was able to classify the scanner vendor with an average accuracy of 0.87 ± 0.13 (chance level 0.33).
4.4 Traveling brain variability
Fig. 6, Fig. 7 report the Bland-Altman plots for the assessment of the reproducibility across different vendors of subcortical volume and cortical thickness measurements, respectively. For each Figure, the first row reports the comparisons of the traveling data collected in AN, while the second row shows the results for the traveling data collected AS. In none of the cases the mean value of the difference significantly differs from 0 on the basis of a 1-sample t-test, indicating that there are no systematic biases.
Fig. 6Bland-Altman plots for the reproducibility assessment of the measurements of subcortical volumes across different vendors (V1, V2, V3), considering the traveling brain data collected in AN (first row, 4 subjects) and AS (second row, 3 subjects). To simplify reading, the homologous structures in left and right hemispheres were represented with the same color.
Fig. 7Bland-Altman plots for the reproducibility assessment of the measurements of cortical thickness of brain parcels across different vendors (V1, V2, V3), considering the traveling brain data collected in AN (first row, 4 subjects) and AS (second row, 3 subjects). To simplify reading, the homologous structures in left and right hemispheres were represented with the same color.
The mean value of percentage of variations in measuring the volumes of deep structures varies from 0.2% to 1.3%, while the standard deviation ranges in both data sets from about 5% to 8%. The highest values of variation, associated with a lower reproducibility level, are related to nuclei accumbens, which are small and challenging structures to segment, as already stated (purple dots in Fig. 5).
Analogously, for the measurement of cortical thicknesses, the mean value of the percentage of variation goes from 0.2% to 3.1%, with the standard deviation ranging from about 4% to 7%. In these structures, the highest variations in the AN traveling data set seem to be related to cortical regions with high thickness values such as some temporal regions (temporal pole, the transverse temporal cortex), the insula, and the entorhinal cortex. However, the same trend does not show in the AS traveling subject data set, where the values outside the 95% distribution boundaries are more spread across the entire range of thickness values.
In the Supplementary material the uncorrected and FDR corrected p values of paired T-Test for the repeated subcortical volume and cortical thickness measurements with different vendors are reported for AN and AS subjects.
For the subcortical regions, no significant differences between each couple of vendors were obtained in AN, while only the measurement of the volume of thalamus resulted statistically significant different between V1 and V2 of AS.
For the cortical regions, significant differences were found between V1 and V2 of AN (inferior temporal gyrus, lateral orbito-frontal gyrus, post-central gyrus, superior parietal gyrus) and of As (inferior temporal gyrus). Analogously, V1 and V3 show significant differences both in AN (inferior temporal gyrus, post-central gyrus) and in AS (inferior temporal gyrus). No statistically significant differences were obtained between V2 and V3 both in AN and AS.
5. Discussion
The work of the RIN – Neuroimaging Network started to address some of the urgent challenges for a full exploitation of MRI biomarkers for the diagnosis and prognosis in neurology field [
Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
]. In particular, the project developed Standard Operating Procedures for the acquisition of T1-weighted MRI of the brain, adapted to multivendor scenarios and suitable for the equipment available in hospitals. On this basis, in this study a set of reference values for different anatomical cerebral structures was extracted from a population of young healthy subjects and the residual variability after the harmonization of the acquisition protocol was assessed in cortical and subcortical regions, by segmenting the images with one of the most widespread techniques (the FreeSurfer utility). We observed a residual intra-site minimum variation that ranges from about 2% to 10% and a maximum intra-site variation that ranges from 16% to 28%. The inter-site variability calculated on the entire multicentric dataset ranges from about 11% to 19% whilst the inter-vendor variability calculated on the entire dataset ranges from 5% to 18%. As expected, the volume variability changes considerably, depending on the intrinsic characteristics of the segmented subcortical structures, with some distributions across sites that are statistically different, even after total intracranial volume normalization, in specific structures such as hippocampus, amygdala, globus pallidus and nucleus accumbens. The same applies to cortical thickness measurements for which the percentage variation is lower since the intra-site minimum variation ranges from 0.3% to 5% and the maximum ranges from about 5% to 16%. The inter-site variability of brain cortex thickness calculated on the entire dataset ranges from 3% to 10% similarly to the inter-vendor variability which ranges from 3% to 12%. The thickness distributions of cortical parcels are statistically different across sites in about 30% of regions, equally distributed among hemispheres (11 ROIs in left and 10 ROIs in right hemisphere).
Multicentric studies, targeted to specific brain region alterations in terms of volume or thickness, usually plan the multicentric acquisition settings to minimize the variability due to acquisition parameters. We observed in this study that, even after an MRI definition of Standard Operating Procedures which minimizes the variability in the acquisition parameters, a complete image harmonization is not achieved. A residual not negligible variability is present due to the test–retest variability combined with the variations in T1-weighted images induced by the input parameters specific to each vendor. Thus, the expected pathological effect (e.g. the amount of cortical thinning in a specific region of interest or the volume enlargement in a deep structure due to pathophysiological mechanisms) in such studies must be compared to this residual variability in order to estimate appropriate sample sizes (both at global as well as at intra-site level).
Quality control measures analyses, indeed, confirmed that the T1-weighted MRI images of the brain are still strongly dependent on the vendor in terms of contrast to noise and signal to noise in different brain tissues even after the definition of Standard Operating Procedures for brain MRI acquisition, in part also observable in Fig. 2. On the other hand, the same analyses ruled out the possibility that systematic signal alterations with a significant impact on the brain structures measurements were due to the number of channels of the head coils or to a specific scanner model of the same vendor (Fig. 5).
However, it is important to point out that differences in quality control metrics distributions could be generated also by the not perfect harmonization of T1-weighted sequences across vendors. An ideal match of different sequences with the same weighting, but from different vendors, would have required to change several variables, often not accessible to the radiographer. On the contrary, the SOPs were developed to help the operator to set the protocol on a commercial scanner, equipped with common sequences, changing simple parameters.
Even if beneficial, the definition of SOPs does not guarantee the similarity in quantitative volumetric measures. The CNR between gray and white matter seems to be the main driving feature for the automated gray matter structures segmentation. Indeed, even though the intensity range is visually well matched for vendor 1 and 2 (Fig. 2), the CNR is different (as shown in Fig. 4, Fig. 5) and some discrepancies appear in intra-subject measurements (Fig. 6, Fig. 7). Conversely, when the difference in CNR is smaller (vendor 1 and 3) despite a remarkable visual difference (Fig. 2), there is a better similarity in gray matter measures on the same subjects.
The residual impact of the scanner vendor on the brain measurements was detectable with a very simple machine learning experiment on vendor prediction which obtained accuracy values not compatible with the chance level. This is in line with previous studies [
] which demonstrated the impact of a not well-designed training set in causing sample and site dependent classifiers, originally thought for the detection of novel anatomical biomarkers of pathologies, which can show significantly positive performances due to underlying and not controlled capability in site classification. For these reasons, particular care should be taken in designing machine learning experiments on multivariate T1-weighted MRI derived measurements and in deep learning approaches which are even more sensitive to subtle intensity variations due to scanner properties even for images acquired with Standard Operating Procedures and well controlled protocols.
Traveling subjects’ analyses showed a good agreement in both subcortical and cortical measurements obtained on the same subjects with scanners from different vendors. The residual variability in measuring the volumes of deep structures, calculated as the standard deviation of percentage variation in Bland-Altman plots, ranges on both data sets from about 5% to 8% depending on the considered structures. The highest values of variation, indicating lower levels of reproducibility, are related to nuclei accumbens, which are small and challenging structures to segment.
Regarding the measurements of cortical thickness, the standard deviation values range in both data sets from about 4% to 7% where the highest variations in the AN traveling data set seem to be related to cortical regions with high thickness values such as the temporal regions (temporal pole, the transverse temporal cortex), the insula, and the entorhinal cortex.
Except for the finding on the difference between V1 and V2 of AS in the thalamus, the specific areas that show statistically significant alterations are cortical regions: the inferior temporal gyrus, the lateralorbitofrontal gyrus, the postcentral gyrus and the superior parietal gyrus. The main contribution to the augmented variability in these regions may be related to the increased test–retest variability [
Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.
The order of magnitude of these intra-subject percent variations must be put in the context of normal aging or pathological alterations such as those related to neurodegenerative processes [
]. For example, the pattern of atrophy due to aging is threefold milder in normal aging than Alzheimer’s Disease (AD) (5 vs. 18% in the medial temporal lobe [
Dickerson BC, Bakkour A, Salat DH, Feczko E, Pacheco J, Greve DN, et al. The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb Cortex 2009;19:497–510. doi:10.1093/cercor/bhn113.
Longitudinal studies on specific anatomical MRI biomarkers of the brain should then be designed according to these variability values and to the expected pathological effect in order to determine the correct experimental sample sizes.
In general, even though quality control measures remain strongly dependent on the scanner vendor even after the definition of the acquisition protocol, the agreement on the traveling brain anatomical measurements suggests that a good reproducibility can be achieved at inter-site level on the same subjects, with an overall variability (5–8%). This intra-subject variability, which is mainly due to the residual differences after image protocol definition, contributes to the measured intra-vendor (5–18%) and mean intra-site variability (9–17%) that were evaluated on different subjects and thus impacted by the inter-subject variability component too. However, the capability of a simple SVM classifier to identify the scanner vendor with an accuracy well above the chance level underlines the risk that multivariate approaches can be particularly sensitive to subtle image intensities changes that can be reflected in high-level anatomical measures.
6. Limitations
The global variability that we assessed in the multicentric experiment has many different sources: the test–retest variability [
Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.
], the inter-vendor variability, the inter-site variability, the inter-subject variability. By disaggregating the data by vendor, the intra-vendor variability was assessed in order to check whether systematic biases could be observed; by performing the traveling brain experiment the inter-vendor/inter-site variability was assessed. However, in all these cases we did not assess the test–retest variability which intrinsically contributes. Since the aim of our study was to assess the variability and reproducibility of morphometric measures derived from T1w images across different sites in a clinical setting after providing some Standard Operating Procedures (which is one of the most common scenarios in clinical research) we must be aware that the test–retest variability will always contribute to the global variability.
As discussed above, another limitation of this study is the not ideal harmonization of T1w sequences across vendors. The Standard Operating Procedures were defined by looking for a compromise between protocol matching and image acquisition in a clinical environment, imposing a uniform spatial resolution and similar time of acquisition. To minimize the variability across vendors, a better harmonization of parameters should be carried out, even if this could request the modification of advanced variables of sequences, not easily feasible in clinical setting.
As described in the method section, the study was designed by using one segmentation algorithm only. The choice was due to its large diffusion in usage and to its very well-known characterization in many contexts. Different segmentation approaches could in principle have different impacts on the evaluation of the variability of anatomical measurements at both subcortical and cortical levels [
], being less or more prone to subtle signal variations in different brain areas. The results for both intra- and inter-site variability could be affected by the small numerosity of subjects collected in each site (mean and standard deviation of 5.5 ± 1.1 subjects per site), which can produce an overestimation of such variability. The numerosity of the two traveling brain experiments is also limited. Further studies should increment the intra-site sampling in order to reach a more robust statistical evaluation along with bigger traveling brain settings in different sites.
7. Conclusions
The work of the RIN – Neuroimaging Network allowed the acquisition in a multicentric framework of a normative dataset of cerebral T1-weighted MRI of young healthy subjects, by using Standard Operating Procedures. The analyses of the MRI derived measurements allowed the extraction of normative anatomical reference values together with their variability. The acquisitions with the same protocol on a dataset of traveling subjects allowed to disentangle the contribution of subject anatomical variability and the vendor impact. Although a good agreement was shown, the impact of the acquisition scanner on the MRI-derived anatomical measures is still not negligible and detectable through simple data mining approaches, particularly through multivariate classifiers.
8. The RIN Neuroimaging Network
Maria Grazia Bruzzone (Fondazione IRCCS Istituto Neurologico Carlo Besta), Claudia A. M. Gandini Wheeler-Kingshott (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, UCL Queen Square Institute of Neurology, University of Pavia), Michela Tosetti (Fondazione IRCCS Stella Maris), Alberto Redolfi (IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli), Egidio D'Angelo (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, University of Pavia), Gianluigi Forloni (Istituto di Ricerche Farmacologiche Mario Negri IRCCS), Raffaele Agati (IRCCS Istituto delle Scienze Neurologiche di Bologna), Marco Aiello (IRCCS SDN Istituto di Ricerca), Elisa Alberici (IRCCS Istituti Clinici Scientifici Maugeri), Carmelo Amato (Oasi Research Institute-IRCCS), Domenico Aquino (Fondazione IRCCS Istituto Neurologico Carlo Besta), Filippo Arrigoni (Istituto Scientifico, IRCCS E. Medea), Francesca Baglio (IRCCS Fondazione don Carlo Gnocchi onlus), Stefano Bastianello (Fondazione IRCSS Istituto Neurologico Naz.le Mondino), Laura Biagi (Fondazione IRCCS Stella Maris), Lilla Bonanno (IRCCS Centro Neurolesi Bonino Pulejo), Paolo Bosco (Fondazione IRCCS Stella Maris), Francesca Bottino (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Marco Bozzali (Fondazione IRCCS Santa Lucia), Chiara Carducci (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Irene Carne (IRCCS Istituti Clinici Scientifici Maugeri), Lorenzo Carnevale (IRCCS Neuromed), Antonella Castellano (IRCCS Ospedale San Raffaele), Carlo Cavaliere (IRCCS SDN Istituto di Ricerca), Mattia Colnaghi (Istituto Auxologico Italiano IRCCS), Giorgio Conte (Fondazione IRCCS Cà Granda Osp. Maggiore Policlinico), Mauro Costagli (University of Genova; Fondazione IRCCS Stella Maris), Silvia De Francesco (IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli), Greta Demichelis (Fondazione IRCCS Istituto Neurologico Carlo Besta), Valeria Elisa Contarino (Fondazione IRCCS Ca Granda Osp. Maggiore Policlinico), Andrea Falini (IRCCS Ospedale San Raffaele), Stefania Ferraro (Fondazione IRCCS Istituto Neurologico Carlo Besta), Giulio Ferrazzi (IRCCS Ospedale San Camillo), Lorenzo Figà Talamanca (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Cira Fundarò (IRCCS Istituti Clinici Scientifici Maugeri), Simona Gaudino (IRCCS Fondazione Policlinico Universitario Agostino Gemelli), Francesco Ghielmetti (Fondazione IRCCS Istituto Neurologico Carlo Besta), Ruben Gianeri (Fondazione IRCCS Istituto Neurologico Carlo Besta), Giovanni Giulietti (Fondazione IRCCS Santa Lucia), Marco Grimaldi (IRCCS Istituto Clinico Humanitas), Antonella Iadanza (IRCCS Ospedale San Raffaele), Marta Lancione (Fondazione IRCCS Stella Maris), Fabrizio Levrero (IRCCS Ospedale Policlinico San Martino). Raffaele Lodi (IRCCS Istituto delle Scienze Neurologiche di Bologna), Daniela Longo (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Giulia Lucignani (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Martina Lucignani (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Maria Luisa Malosio (IRCCS Istituto Clinico Humanitas), Vittorio Manzo (Istituto Auxologico Italiano, IRCCS), M. Marcella Laganà (IRCCS Fondazione don Carlo Gnocchi onlus), Silvia Marino (IRCCS Centro Neurolesi Bonino Pulejo), Jean Paul Medina (Fondazione IRCCS Istituto Neurologico Carlo Besta), Edoardo Micotti (Istituto di Ricerche Farmacologiche Mario Negri IRCCS), Claudia Morelli (Istituto Auxologico Italiano IRCCS), Alessio Moscato (IRCCS Istituti Clinici Scientifici Maugeri), Antonio Napolitano (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Anna Nigri (Fondazione IRCCS Istituto Neurologico Carlo Besta), Francesco Padelli (Fondazione IRCCS Istituto Neurologico Carlo Besta), Sara Palermo (Fondazione IRCCS Istituto Neurologico Carlo Besta), Fulvia Palesi (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, University of Pavia), Patrizia Pantano (RCCS Neuromed), Chiara Parrillo (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Luigi Pavone (IRCCS Neuromed), Denis Peruzzo (Istituto Scientifico, IRCCS E. Medea), Nikolaos Petsas (IRCCS Neuromed), Alice Pirastru (IRCCS Fondazione don Carlo Gnocchi onlus), Letterio S. Politi (IRCCS Istituto Clinico Humanitas), Luca Roccatagliata (IRCCS Ospedale Policlinico San Martino), Elisa Rognone (Fondazione IRCSS Istituto Neurologico Naz.le Mondino), Andrea Rossi (Ospedale Pediatrico Istituto Giannina Gaslini, Università di Genova), Maria Camilla Rossi-Espagnet (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Claudia Ruvolo (IRCCS Centro Neurolesi Bonino Pulejo), Marco Salvatore (IRCCS SDN Istituto di Ricerca), Giovanni Savini (IRCCS Istituto Clinico Humanitas), Fabrizio Tagliavini (Fondazione IRCCS Istituto Neurologico Carlo Besta), Emanuela Tagliente (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Claudia Testa (IRCCS Istituto delle Scienze Neurologiche di Bologna), Caterina Tonon (IRCCS Istituto delle Scienze Neurologiche di Bologna), Domenico Tortora (Ospedale Pediatrico Istituto Giannina Gaslini), Fabio Maria Triulzi (Fondazione IRCCS Cà Granda Osp. Maggiore Policlinico).
Funding
This study was funded by the Italian Minister of Health under the RC grant, the 5x1000 voluntary contributions to IRCCS Fondazione Stella Maris and under the following RIN projects: RRC-2016-2361095; RRC-2017-2364915; RRC-2018-2365796; RCR-2019-23669119_001 along with the contribution of the Ministry of Economy and Finance (CCR-2017-23669078).
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
The impact of automated hippocampal volumetry on diagnostic confidence in patients with suspected Alzheimer’s disease: A European Alzheimer’s Disease Consortium study.
Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.
Dickerson BC, Bakkour A, Salat DH, Feczko E, Pacheco J, Greve DN, et al. The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb Cortex 2009;19:497–510. doi:10.1093/cercor/bhn113.