Advertisement
Research Article| Volume 110, 102577, June 2023

Download started.

Ok

Quality assessment, variability and reproducibility of anatomical measurements derived from T1-weighted brain imaging: The RIN–Neuroimaging Network case study

Open AccessPublished:April 29, 2023DOI:https://doi.org/10.1016/j.ejmp.2023.102577

      Highlights

      • Reproducibility of harmonized T1-weighted protocol at 3 T was tested in a multicentric study.
      • The study measured a global variability that ranges from 11% to 19% for subcortical volumes.
      • Moreover, we measured a global variability that ranges from 3% to 10% for cortical thicknesses.
      • The Bland-Altman analysis on traveling brain measures did not detect systematic scanner biases.
      • SVM can classify the scanner vendor from brain measures with an accuracy = 0.60 ± 0.14 (chance 0.33).

      Abstract

      Initiatives for the collection of harmonized MRI datasets are growing continuously, opening questions on the reliability of results obtained in multi-site contexts.
      Here we present the assessment of the brain anatomical variability of MRI-derived measurements obtained from T1-weighted images, acquired according to the Standard Operating Procedures, promoted by the RIN-Neuroimaging Network. A multicentric dataset composed of 77 brain T1w acquisitions of young healthy volunteers (mean age = 29.7 ± 5.0 years), collected in 15 sites with MRI scanners of three different vendors, was considered. Parallelly, a dataset of 7 “traveling” subjects, each undergoing three acquisitions with scanners from different vendors, was also used. Intra-site, intra-vendor, and inter-site variabilities were evaluated in terms of the percentage standard deviation of volumetric and cortical thickness measures. Image quality metrics such as contrast-to-noise and signal-to-noise ratio in gray and white matter were also assessed for all sites and vendors.
      The results showed a measured global variability that ranges from 11% to 19% for subcortical volumes and from 3% to 10% for cortical thicknesses. Univariate distributions of the normalized volumes of subcortical regions, as well as the distributions of the thickness of cortical parcels appeared to be significantly different among sites in 8 subcortical (out of 17) and 21 cortical (out of 68) regions of i nterest in the multicentric study.
      The Bland-Altman analysis on “traveling” brain measurements did not detect systematic scanner biases even though a multivariate classification approach was able to classify the scanner vendor from brain measures with an accuracy of 0.60 ± 0.14 (chance level 0.33).

      Keywords

      1. Introduction

      In the last decades, non-invasive anatomical measurements derived from Magnetic Resonance Imaging (MRI) of the brain played a pivotal role in the assessment of many diseases such as neurodevelopmental, neurodegenerative, psychiatric and rare conditions. Many of these measurements already demonstrated to be well-suited neuroimaging anatomical biomarkers for the early diagnosis and assessment of Alzheimer’s Disease [
      • Pini L.
      • Pievani M.
      • Bocchetta M.
      • Altomare D.
      • Bosco P.
      • Cavedo E.
      • et al.
      Brain atrophy in Alzheimer’s Disease and aging.
      ,
      • Bosco P.
      • Redolfi A.
      • Bocchetta M.
      • Ferrari C.
      • Mega A.
      • Galluzzi S.
      • et al.
      The impact of automated hippocampal volumetry on diagnostic confidence in patients with suspected Alzheimer’s disease: A European Alzheimer’s Disease Consortium study.
      ,
      • Fennema-Notestine C.
      • Hagler D.J.
      • McEvoy L.K.
      • Fleisher A.S.
      • Wu E.H.
      • Karow D.S.
      • et al.
      Structural MRI biomarkers for preclinical and mild Alzheimer’s disease.
      ], frontotemporal dementia [
      • Rohrer J.D.
      Structural brain imaging in frontotemporal dementia.
      ,
      • Meyer S.
      • Mueller K.
      • Stuke K.
      • Bisenius S.
      • Diehl-Schmid J.
      • Jessen F.
      • et al.
      Predicting behavioral variant frontotemporal dementia with pattern classification in multi-center structural MRI data.
      ], Parkinson’s Disease [
      • Ibarretxe-Bilbao N.
      • Junque C.
      • Marti M.J.
      • Tolosa E.
      Brain structural MRI correlates of cognitive dysfunctions in Parkinson’s disease.
      ,
      • Sarasso E.
      • Agosta F.
      • Piramide N.
      • Filippi M.
      Progression of grey and white matter brain damage in Parkinson’s disease: a critical review of structural MRI literature.
      ] and for the differential diagnosis of other forms of dementia such as Lewy Body Dementia [
      • Whitwell J.L.
      • Weigand S.D.
      • Shiung M.M.
      • Boeve B.F.
      • Ferman T.J.
      • Smith G.E.
      • et al.
      Focal atrophy in dementia with Lewy bodies on MRI: a distinct pattern from Alzheimer’s disease.
      ,
      • Hanyu H.
      • Shimizu S.
      • Tanaka Y.
      • Hirao K.
      • Iwamoto T.
      • Abe K.
      MR features of the substantia innominata and therapeutic implications in dementias.
      ]. In psychiatric and neurodevelopmental disorders, brain anatomical measurements have been shown either relevant or at least promising in the study of many diseases such as schizophrenia [
      • Wright I.C.
      • Rabe-Hesketh S.
      • Woodruff P.W.R.
      • David A.S.
      • Murray R.M.
      • Bullmore E.T.
      Meta-analysis of regional brain volumes in schizophrenia.
      ,
      • Lawrie S.M.
      • Abukmeil S.S.
      Brain abnormality in schizophrenia. A systematic and quantitative review of volumetric magnetic resonance imaging studies.
      ], major depressive disorder [
      • Andreescu C.
      • Butters M.A.
      • Begley A.
      • Rajji T.
      • Wu M.
      • Meltzer C.C.
      • et al.
      Gray matter changes in late life depression—a structural MRI analysis.
      ,
      • Amico F.
      • Meisenzahl E.
      • Koutsouleris N.
      • Reiser M.
      • Möller H.J.
      • Frodl T.
      Structural MRI correlates for vulnerability and resilience to major depressive disorder.
      ], autism spectrum disorders [
      • Amaral D.G.
      • Schumann C.M.
      • Nordahl C.W.
      Neuroanatomy of autism.
      ,
      • Ecker C.
      The neuroanatomy of autism spectrum disorder: an overview of structural neuroimaging findings and their translatability to the clinical setting.
      ,
      • Bosco P.
      • Giuliano A.
      • Delafield-Butt J.
      • Muratori F.
      • Calderoni S.
      • Retico A.
      Brainstem enlargement in preschool children with autism: Results from an intermethod agreement study of segmentation algorithms.
      ], childhood apraxia of speech [
      • Preston J.L.
      • Molfese P.J.
      • Mencl W.E.
      • Frost S.J.
      • Hoeft F.
      • Fulbright R.K.
      • et al.
      Structural brain differences in school-age children with residual speech sound errors.
      ,
      • Kadis D.S.
      • Goshulak D.
      • Namasivayam A.
      • Pukonen M.
      • Kroll R.
      • De Nil L.F.
      • et al.
      Cortical thickness in children receiving intensive therapy for idiopathic apraxia of speech.
      ,
      • Conti E.
      • Retico A.
      • Palumbo L.
      • Spera G.
      • Bosco P.
      • Biagi L.
      • et al.
      Autism spectrum disorder and childhood apraxia of speech: early language-related hallmarks across structural MRI study.
      ].
      However, there are still many challenges to be tackled for the advances in the detection of structural brain biomarkers. There is a strong need for large sample sizes to provide sufficient statistical power for the investigation of groups and subgroups and to deal with relatively small pathology effect size, hence multi centric studies are more and more necessary for the development of both pharmacological and non-pharmacological interventions.
      In this context, in most cases there are unclear recommendations for MRI image acquisition and analysis details for multivendor protocols using the standard equipment available in hospitals. Moreover, there are no clear quality control guidelines and reference values of different markers and brain measurements extracted from T1-weighted imaging together with unclear recommendations for retrospective harmonization of already existing data acquired with different protocols [
      • Jovicich J.
      • Barkhof F.
      • Babiloni C.
      • Herholz K.
      • Mulert C.
      • Berckel B.N.M.
      • et al.
      Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
      ]. These factors hinder the advances in the field since they represent sources of variability which, in addition to the heterogeneity of the population under exam, often hamper the detection of subtle pathological changes, even in the context of the recent development of advanced and powerful artificial intelligence techniques [
      • Ferrari E.
      • Bosco P.
      • Calderoni S.
      • Oliva P.
      • Palumbo L.
      • Spera G.
      • et al.
      Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study.
      ].
      For these reasons many initiatives were promoted for the harmonization of MRI acquisitions protocols and data analyses such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (http://adni.loni.usc.edu/) [
      • Jack C.R.
      • Bernstein M.A.
      • Fox N.C.
      • Thompson P.
      • Alexander G.
      • Harvey D.
      • et al.
      The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods.
      ] and Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) [
      • Thompson P.M.
      • Stein J.L.
      • Medland S.E.
      • Hibar D.P.
      • Vasquez A.A.
      • Renteria M.E.
      • et al.
      The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data.
      ].
      With the same aim, the RIN - Neuroimaging Network, an Italian national consortium dedicated to share large-scale multimodal quantitative MRI datasets, promoted the development of guidelines for the data acquisition and processing [
      • Nigri A.
      • Ferraro S.
      • Gandini Wheeler-Kingshott C.A.M.
      • Tosetti M.
      • Redolfi A.
      • Forloni G.
      • et al.
      Quantitative MRI harmonization to maximize clinical impact: the RIN–neuroimaging network.
      ,
      • Lancione M.
      • Bosco P.
      • Costagli M.
      • Nigri A.
      • Aquino D.
      • Carne I.
      • et al.
      Multi-centre and multi-vendor reproducibility of a standardized protocol for quantitative susceptibility Mapping of the human brain at 3T.
      ]. In this study, we present the results obtained on brain structural MRI measures. In particular, we aimed to measure the anatomical variability of different brain structures, taking into account the influence on these measures of scanner vendor along with different hardware solutions used for data acquisition. In addition, we explored the variability of some image quality metrics which may have indirect effects on the image-derived anatomical measurements.

      2. Material and methods

      2.1 Description of the datasets

      Data were acquired in fifteen sites of the RIN – Neuroimaging Network, equipped with 3 T MRI scanners from three different vendors (Philips Healthcare, GE Healthcare, and Siemens Healthineers). Two distinct studies were conducted. The first considered data acquired in 14 sites (multicentric study); the second considered data acquired on a small number of subjects that repeated the acquisition on three sites selected on the basis of scanner vendor and geographical area (traveling brain study).

      2.2 Multicentric study

      In order to assess the image quality and the anatomical variability in a multicentric framework of cerebral measurements, derived from T1-weighted brain MRI, a dataset composed of 77 brain acquisition obtained in as many young healthy volunteers (45F/32 M, mean age = 29.7 ± 5.0 years, range [21–45] years) was considered. In particular, we collected 29 datasets from vendor 1 (mean age = 30.6 ± 5.4 years, 18F/11 M), 18 from vendor 2 (mean age = 30.4 ± 4.6 years, 10F/8M), and 30 from vendor 3 (mean age = 28.4 ± 4.8 years, 17F/13 M). Details on the subjects recruited in each participant center are reported in Table 1, along with the hardware information of the MR scanner (vendor code and number of channels of the receiving coils).
      Table 1Details on the technical characteristics of scanners (vendor, number of channels of the receiving head coils) of each site. Both site and scanner vendor were anonymized using a numerical code. For the multicentric study, the number of the subjects recruited at each site and their demographical data are reported. For the traveling brain study, columns indicate the geographical area of each site (North Area, AN, and South Area, AS), as well as the corresponding number of acquired subjects.
      Multicentric studyTraveling brain study
      SiteVendorModelRx Coil [ch]#subjectsAgeSexArea#subjects
      11a32531.8 ± 1.83F/2MAN4
      21a32629.7 ± 4.35F/1M
      31b32634.0 ± 7.13F/3M
      41c32325.0 ± 2.01F/2M
      51a32531.6 ± 6.34F/1MAS3
      61b32428.5 ± 5.62F/2M
      72d32729.3 ± 2.74F/3M
      82d16629.3 ± 4.74F/2MAN4
      92e8533.4 ± 6.02F/3MAS3
      103f64626.3 ± 5.94F/2MAN4
      113g8528.0 ± 2.31F/4M
      123h64725.1 ± 3.36F/1M
      133h32532.2 ± 3.03F/2M
      143i12731.1 ± 4.73F/4M
      153h32AS3

      2.3 Traveling brain study

      The inter scanner variability was performed with an additional dataset composed of 7 healthy traveling subjects who underwent three brain MRI acquisitions at three sites equipped with scanners from different vendors.
      Two geographic areas (North Area, AN, and South Area, AS) were defined in Italy. Among the 7 traveling subjects, 4 subjects (mean age = 31.5 ± 2.2 years, 2F/2M) performed the T1-weighted MRI acquisitions in area AN, at sites 1, 8, and 10; the remaining 3 subjects (mean age = 28.4 ± 11.0 years, 2F/1M) were acquired in area AS, at sites 5, 9, and 15 (Table 1).

      2.4 MRI brain imaging protocol

      One of the main objectives of the RIN - Neuroimaging Network was the development of Standard Operating Procedures (SOPs) for the acquisition of a comprehensive MRI protocol for the brain. The complete set of scanning parameters for the acquisition of T1-weighted MRI imaging is reported in Table 2. The datasets, both for the multicentric study and the traveling brain study, were acquired according to the agreed SOPs.
      Table 2Parameters of acquisition for T1-weighted MRI, differentiated for each scanner vendor, as reported in the SOPs developed by RIN – Neuroimaging Network.
      VendorPHILIPSGESIEMENS
      Sequence type3D FFE3D FSPGR BRAVOMP-RAGE
      Slice orientationsagittalsagittalsagittal
      FOV [mm]240 × 240256 × 256256 × 256
      Resolution [mm3]1 × 1 × 11 × 1 × 11 × 1 × 1
      Matrix (Base Resolution)240 x 240256 x 256256 x 256
      Slice thickness111
      Slice gap (mm)
      Number of slices175 – 180175 – 180175 – 180
      Phase Encoding directionAPAPAP
      Slice orderInterleavedInterleavedInterleaved
      NSA/Averages/NEX111
      TR [ms]2300not modifiable2300
      TE [ms]2.963.22.96
      TI [ms]900900900
      Flip angle
      Fat SuppressionNoNoNo
      k-space coverage (Halfscan/Partial Fourier)NoNoNo
      Acceleration factorSENSE ≤ 2.3ARC = 2GRAPPA = 2
      FilterCLEAR onPURE onPrescan Normalize on
      Bandwidth (Hz/pixel)191122240
      Duration≈ 5 min 30 sec≈ 5 min 30 sec≈ 5 min 30 sec
      FFE = Fast Field Echo; FSPGR = Fast SPoiled GRadient echo; BRAVO = BRAin VOlume imaging; MPRAGE = Magnetization Prepared Rapid Gradient Echo.

      3. Data analysis

      3.1 Image segmentation

      The FreeSurfer (FS, v.6.0) analysis pipeline [
      • Fischl B.
      • Salat D.H.
      • Busa E.
      • Albert M.
      • Dieterich M.
      • Haselgrove C.
      • et al.
      Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain.
      ,
      • Fischl B.
      FreeSurfer.
      ] was used to carry out the segmentation of the brain in subcortical and cortical substructures (according to Desikan-Killiany parcellation [
      • Desikan R.S.
      • Ségonne F.
      • Fischl B.
      • Quinn B.T.
      • Dickerson B.C.
      • Blacker D.
      • et al.
      An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.
      ]). Firstly, we converted 3D T1-weighted MR brain images from DICOM format to NIfTI format. Secondly, we used the FS pre-processing workflow, known as recon-all analysis pipeline, which processes the input structural MRI scan across several FS functions performing all cortical reconstruction through 31 processing steps. In order to carry out gray matter tissue segmentation, FS takes advantage of a lot of information such as image intensities, global position within the brain and relative position to neighboring brain regions. Based on this information, it uses a probabilistic atlas in which coordinates have anatomical meaning and a Markov Random Field (MRF) model is used to find local spatial relationships between labeled structures. FreeSurfer implements a model based on a mixture of a small number of Gaussians for each structure for each point in the space and a maximum posterior estimate of the model parameters to assign one of the Region of Interest (ROI) labels to each voxel. From the FS segmentations results, we extracted the volumes (mm3) of the subcortical gray matter structures and the thicknesses (mm) of the cortical regions. Along with the brain structure, FS was used to measure the total intra-cranial volume, which is a well-established measurement for volume normalization across subjects [
      • Whitwell J.L.
      • Crum W.R.
      • Watt H.C.
      • Fox N.C.
      Normalization of cerebral volumes by use of intracranial volume: Implications for longitudinal quantitative mr imaging.
      ]. In Fig. 1A an example of brain structural T1-weighted images is shown together with the overlay of FreeSurfer segmentation results (in false color) of subcortical and cortical gray matter structures (Fig. 1B).
      Figure thumbnail gr1
      Fig. 1A. Sagittal, coronal and axial view of a raw T1-weighted of a 3-D image of a representative subject of the dataset. B. overlay of FreeSurfer segmentation results (in false color) of subcortical and cortical gray matter structures.

      3.2 Quality control

      A pipeline for image quality control on T1-weighted dataset was implemented, by using the MRIQC protocol [
      • Esteban O.
      • Birman D.
      • Schaer M.
      • Koyejo O.O.
      • Poldrack R.A.
      • Gorgolewski K.J.
      • et al.
      MRIQC: advancing the automatic prediction of image quality in MRI from unseen sites.
      ]. The following measures were extracted from each dataset for the evaluation of the main quality indicators:
      - Contrast-to-Noise Ratio [
      • Magnotta V.A.
      • Friedman L.
      Measurement of signal-to-noise and contrast-to-noise in the fBIRN multicenter imaging study.
      ] (CNR): the CNR evaluates how separated the distributions of signal intensity of adjacent tissues are. CNR indicates specifically the contrast between GM and WM are. Higher values indicate a better gray matter structure definition with respect to the surrounding areas. Additionally, the contrast-to-noise ratio was evaluated between GM and CSF (CNRGMCSF) in order to investigate the impact of this different contrast on the segmentation of GM structures surrounded by CSF.
      - Signal-to-Noise Ratio (SNR): the SNR evaluates how much the signal intensity in a specific region is significant with respect to the noise fluctuations. It is calculated as the ratio between the mean intensity of the considered tissue and its standard deviation in the same region.
      - Entropy Focus Criterion [
      • Atkinson D.
      • Hill D.L.G.
      • Stoyle P.N.R.
      • Summers P.E.
      • Keevil S.F.
      Automatic correction of motion artifacts in magnetic resonance images using an entropy focus criterion.
      ] (EFC): the EFC uses the Shannon entropy of voxel intensities as an indication of ghosting and blurring. Lower values indicate less artifacts and better image quality.
      - Coefficient of Joint Variation (CJV): the CJV of gray (GM) and white matter (WM) was proposed for the evaluation of intensity non-uniformity. Higher values indicate worse image quality due to the presence of heavy head motion and large intensity non-uniformity artifacts.

      3.3 Variability assessment

      The variability of anatomical measurements of cortical and subcortical regions was assessed through the standard deviation of the measures on the whole multicentric data set, in terms of percentage with respect to the corresponding mean value. Moreover, the minimum and the maximum of the standard deviation were calculated for intra-site, inter-site and intra-vendor scenarios.
      The distributions of quality control measurements were also calculated separately for different vendors, for different scanner models and number of elements of the receiving RF coils.

      3.4 Statistical analyses

      The comparisons of the mean values of the extracted measurements among the participant centers were performed with an ANOVA test and the statistical significance threshold of p-value = 0.01 was set (both uncorrected and with False Discovery Rate correction).
      For the traveling brain study, Bland-Altman plots were considered to evaluate the agreement between the extracted measures with a different approach and to assess the variability at both subject and traveling brain cohort level. In order to assess potential biases limited to specific regions, paired t-tests were performed for each anatomical measurement for every couple of vendors under analysis.
      The segmentation of the images, the quality control and the statistical analyses were performed at a single site, under the same operating system in order to avoid additional sources of variability [
      • Gronenschild E.H.B.M.
      • Habets P.
      • Jacobs H.I.L.
      • Mengelers R.
      • Rozendaal N.
      • van Os J.
      • et al.
      The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements.
      ].

      3.5 Machine learning experiments on the multicentric data

      In order to assess the residual dependency of the brain anatomical measurements on the acquisition characteristics after the application of the SOPs, a simple Support Vector Machine (SVM) classifier [
      • Cortes C.
      • Vapnik V.
      Support-vector networks.
      ] was trained on the anatomical measures extracted by the segmentation algorithm to recognize the Vendor (Vendor 1, Vendor 2, Vendor 3) that manufactured the scanner.
      The same classification problem was tackled by feeding a linear SVM classifier with the quality metrics (CJV, CNR, SNR_WM, SNR_GM, EFC) extracted through the MRIQC method.
      In both the scenarios, the training and testing procedure was performed through a cross validation approach (8-fold cross validation). The classification performance was assessed by measuring the mean accuracy and the standard deviation of the accuracy on the 8 validation folds.

      4. Results

      Fig. 2 reports the images obtained with all vendors, from two subjects of the travelling brain study, one acquired in the North area (AN, left panel) and one in the South area (AS, right panel).
      Figure thumbnail gr2
      Fig. 2T1w images acquired with the three vendors on the same two subjects: one subject in the North area, AN (left column) and one subject in the South area (AS, right column). The 3D images were registered to the MNI-152 (Montreal Neurological Institute) template and intensity rescaled between 0 and 1 by using as a min reference the 1st intensity percentile and as a max reference the 99th intensity percentile.

      4.1 Multicentric variability

      Fig. 3 shows two examples of distributions of volume and thickness measurements obtained across sites and vendors. Left and right hippocampal values (normalized to the total intracranial volume) are reported together with the thicknesses of the left and right precuneus cortices. The chosen structures are particularly suited for the study of neurodegeneration and aging, since they are strongly involved in cognition and memory.
      Figure thumbnail gr3
      Fig. 3Examples of box plots of anatomical measurement distributions across the different sites (left panels, site numerical codes on the x-axis) and the different vendors (right panels, vendor numerical codes on the x-axis). In the first row, the volumes of the left and right hippocampus (normalized to the total intracranial volume (TIV)) are reported. In the second row, the thicknesses of the left and right precuneus cortices are shown. The bottom and top edges of each box indicate the 25th and 75th percentile of the measure distribution respectively, and the central line indicates the median. Color code: blue for Vendor 1 (V1), orange for V2, and green for V3. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      For a more exhaustive description of the results, the values of volume variability of all the segmented subcortical structures on the multicentric dataset are reported in Table 3. It reports, for each structure, the minimum, the maximum, and the mean of intra-site percentage variability, to assess the range of volume variation within a single site scenario. The variations were evaluated on volume values normalized to the total intra-cranial volume (TIV). Analogously, the intra-vendor and inter-site variabilities are reported. In addition, the global inter-site mean volume values and statistical significance of the ANOVA test on the compatibility of inter-site sampling are shown. The intra-site minimum variation ranges from 1.91% to 10.31%. The maximum ranges from 15.79% to 27.93% (global mean 11.36%). The intra-vendor variability calculated on the three separate datasets ranges from 5.44% (V2) to 17.70% (V3). Considering the average across all the subcortical areas, the mean variabilities among sites of the same vendor are 14.69% for V1, 8.72% for V2, 9.74% for V3; the latter values are comparable to the inter-site variability calculated on the entire dataset which ranges from 11.4% to 19.13%, with an average of 13.84%.
      Table 3Anatomical variabilities of the measurements of the volume of subcortical structures. Intra-site analysis: minimum, maximum and mean standard deviations calculated on the 14 sites. Intra-vendor analysis: mean standard deviations calculated on datasets from sites of the same vendor (V1, V2, V3 as in Table 1). Inter-site analysis: global standard deviation and mean of volume measurements, and statistical significance of the ANOVA test on the mean compatibility across sites (*pvalue < 0.01). The variations were evaluated on volumes normalized to the Total Intracranial volume (TIV).
      ROIIntra-siteIntra-vendorInter-site
      variability (%SD)variability (%SD)variabilitymean volumepvaluepvalue
      minmaxmeanV1V2V3(%SD)(mm3)(ANOVA)FDR corr
      ThalamusL2,721,510,615,37,46,411,976630,3110,311
      R1,922,69,414,45,46,311,473000,0810,091
      CaudateL6,116,910,513,311,08,212,235920,0120,012
      R5,915,89,812,99,18,312,83663< 0,001*< 0,001*
      PutamenL5,026,711,616,18,79,415,150390,002*0,006*
      R3,926,510,716,46,28,113,950910,0100,018
      PallidumL5,125,511,815,88,59,712,919790,2940,311
      R4,723,611,315,67,29,113,019380,0690,084
      HippocampusL5,519,710,413,37,19,513,141730,001*0,003*
      R4,417,910,013,46,18,012,343090,005*0,012
      AmygdalaL2,622,811,014,16,911,715,51659< 0.001*< 0.001*
      R4,625,310,515,27,78,814,217700,002*0,006*
      Accumbens areaL5,327,914,515,317,615,519,1550< 0,001*< 0,001*
      R10,325,916,916,014,117,718,65860,0220,031
      VentralDCL4,220,611,214,28,210,013,241860,0160,024
      R6,121,311,213,98,99,713,241340,006*0,012
      Brain Stem6,717,311,614,48,19,312,921,6920,0650,084
      In all the considered cases the highest variability is found for the values of the nuclei accumbens which are small and difficult structures to be segmented by an automated tool.
      Similarly, Table 4 reports the measured cortical thickness variations across brain cortex parcels. The intra-site minimum variation ranges from 0.3% to 5.0%. The maximum ranges from 5.0% to 15.5% (global mean 5.1%). The intra-vendor variability of brain cortex thickness ranges from 3.1% (V1) to 12.2% (V2). Considering the average across all the cortical areas, the mean variabilities among sites of the same vendor are 5.3% for V1, 5.5% for V2, 5.4% for V3; the latter values are comparable to the inter-site variability of brain cortex thickness calculated on the entire dataset ranges from 3.3% to 10.4%, with an average of 5.7%.
      Table 4Anatomical variabilities of the measurements of the thicknesses of cortical structures: Intra-site analysis: minimum, maximum, and mean standard deviations calculated on the 14 sites. Intra-vendor analysis: mean standard deviations calculated on datasets from sites of the same vendor (V1, V2, V3 as in Table 1). Inter-site analysis: global standard deviation and mean of thickness measurements and statistical significance of the ANOVA test on the mean compatibility across sites (*pvalue < 0.01).
      Cortical ROIIntra-siteIntra-vendorInter-site
      variability (%SD)variability (%SD)variabilitymean thicknesspvaluepvalue
      minmaxmeanV1V2V3(%SD)(mm)(ANOVA)FDR corr
      banks superior temporalL3,610,95,56,25,65,55,82,550,0440,116
      R2,57,94,64,65,75,05,12,650,1080,179
      caudal anterior cingulateL2,211,17,08,76,57,47,72,670,0490,116
      R3,311,46,55,96,37,86,82,520,3740,439
      caudal middle frontalL2,15,54,13,94,35,04,42,570,1780,257
      R0,36,43,83,14,74,14,02,540,2530,324
      cuneusL3,011,76,04,98,56,36,81,910,2660,330
      R1,512,05,84,79,36,37,01,950,1130,182
      entorhinalL3,912,97,38,75,98,08,23,370,0690,150
      R5,014,99,510,06,611,410,43,470,0580,130
      fusiformL1,65,03,03,63,32,93,32,790,0460,116
      R2,25,23,84,63,43,54,42,810,003*0,011
      inferiorparietalL1,66,33,73,83,55,34,32,500,004*0,013
      R2,45,23,83,64,63,94,12,520,0500,116
      inferiortemporalL1,77,94,24,64,84,35,42,77<0,001*0,001*
      R2,28,04,34,65,43,35,32,79<0,001*0,001*
      isthmuscingulateL2,28,96,16,65,47,36,62,450,4200,468
      R3,58,25,76,36,06,06,12,470,0770,158
      lateraloccipitalL2,17,44,14,44,45,15,12,21<0,001*0,002*
      R2,56,94,54,55,65,55,62,28<0,001*0,002*
      lateralorbitofrontalL2,26,84,04,04,54,35,42,68<0,001*<0,001*
      R2,07,74,54,54,15,45,42,630,001*0,005*
      lingualL1,88,44,94,26,45,15,52,110,1550,235
      R2,29,34,73,57,84,45,32,120,3400,406
      medialorbitofrontalL0,77,45,14,95,05,06,62,48<0,001*<0,001*
      R1,97,75,04,65,26,56,32,510,001*0,005*
      middletemporalL1,25,94,24,84,34,65,12,890,001*0,005*
      R2,46,13,54,13,33,73,82,900,1840,260
      parahippocampalL4,711,28,08,58,67,78,52,890,0910,167
      R2,99,36,05,87,16,26,62,880,0710,151
      paracentralL2,17,04,74,15,95,65,22,470,1560,236
      R2,610,14,83,94,96,75,52,490,0330,097
      parsopercularisL1,67,44,75,15,04,84,92,620,3810,439
      R2,69,24,75,53,84,04,62,610,8640,864
      parsorbitalisL3,09,15,04,85,56,35,72,720,0230,070
      R3,39,05,85,65,86,66,32,720,0970,174
      parstriangularisL2,06,53,95,24,34,84,82,500,003*0,011
      R2,67,15,15,75,04,95,32,470,2490,324
      pericalcarineL3,89,56,66,48,46,78,11,680,001*0,005*
      R4,015,58,27,312,27,48,91,680,2140,297
      postcentralL1,37,14,34,65,05,15,52,12<0,001*0,002*
      R3,09,75,24,45,96,35,92,090,2430,324
      posteriorcingulateL1,811,15,97,14,16,76,32,520,5170,558
      R2,66,03,85,12,94,04,52,500,001*0,005*
      precentralL1,15,93,43,33,74,54,12,630,0400,109
      R2,87,64,03,73,65,34,32,580,0900,169
      precuneusL3,06,24,33,84,55,04,42,480,5760,602
      R1,35,73,83,73,84,13,92,470,6220,640
      rostralanteriorcingulateL2,18,66,17,05,75,76,32,930,4180,468
      R4,012,36,78,34,86,27,42,940,0880,167
      rostralmiddlefrontalL1,55,23,24,23,93,64,52,39<0,001*<0,001*
      R2,85,84,14,63,74,75,12,35<0,001*0,002*
      superiorfrontalL1,88,14,44,84,34,84,72,740,1000,174
      R2,05,83,83,84,44,65,32,71<0,001*<0,001*
      superiorparietalL2,86,04,34,04,15,14,62,250,0800,160
      R0,95,73,84,14,54,54,72,230,004*0,015
      superiortemporalL1,75,53,53,93,34,03,92,860,005*0,016
      R2,08,74,45,24,23,44,32,890,5350,569
      supramarginalL1,46,03,63,52,85,24,02,590,1060,180
      R0,85,43,44,13,83,84,22,580,001*0,003*
      frontalpoleL2,012,17,67,58,17,27,92,720,4540,498
      R3,910,57,98,07,88,88,22,690,2620,330
      temporalpoleL2,013,86,87,59,06,57,53,690,8400,852
      R0,611,66,28,16,36,77,13,760,1390,219
      transversetemporalL2,713,37,57,410,87,08,22,530,2400,324
      R4,114,07,17,29,95,67,42,570,2960,360
      insulaL1,16,14,36,24,43,44,83,050,0390,109
      R2,87,84,55,54,73,94,63,070,1680,249
      The distributions of the normalized volumes of subcortical regions, as well as the distributions of the thickness of cortical parcels appeared to be significantly different among sites in 8 subcortical (out of 17) and 21 cortical (out of 68) ROIs.

      4.2 Quality control measurements

      In Fig. 4 the distributions across sites of each quality control metric are reported. For all of them, the intra-site distributions of the values appeared to be peculiar for each considered site, with a variability that is very different with respect to the global one.
      Figure thumbnail gr4
      Fig. 4Box plots of quality control metrics distributions across sites (left panels, site codes on the x-axis) and vendors (right panels, vendor codes on the x-axis): (A) the contrast-to-noise-ratio (CNR), (B) the contrast-to-noise-ratio between GM and CSF (CNRGMCSF) (C) the signal-to-noise ratio for gray (SNRGM) and (D) white matter (SNRWM), (E) the entropy focus criterion (EFC) and (F) the coefficient of joint variation (CJV), are reported. Color code as in (blue for Vendor 1 (V1), orange for V2, and green for V3). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      In order to disentangle the contributions to the quality metrics variability, the contrast-to-noise ratio and the signal to noise ratio in brain gray matter were aggregated not only by scanner vendor, but also by scanner model and number of channels of the head coils (Fig. 5). The distributions were strongly dependent on the vendor, while neither the specific scanner model nor the number of elements of the head coils seemed to have a significant impact on the considered metrics.
      Figure thumbnail gr5
      Fig. 5Distributions of Contrast-to-noise ratio (CNR) and signal-to-noise ratio in gray matter (SNRGM) distributions on data aggregated by vendor and scanner model: 3 models for V1, 2 models for V2 and 4 models for V3 The boxplot colors correspond to the number of channels of the head coils.

      4.3 Machine learning experiments

      The linear SVM classifier, trained on the anatomical measurements extracted by the segmentation algorithm on the multicentric study, was able to classify the scanner vendor with an average accuracy of 0.60 ± 0.14. This value should be compared with the chance level, which is equal to 0.33 for a three-class classification. A similar SVM classifier, trained on the quality control metric extracted through the MRIQC protocol was able to classify the scanner vendor with an average accuracy of 0.87 ± 0.13 (chance level 0.33).

      4.4 Traveling brain variability

      Fig. 6, Fig. 7 report the Bland-Altman plots for the assessment of the reproducibility across different vendors of subcortical volume and cortical thickness measurements, respectively. For each Figure, the first row reports the comparisons of the traveling data collected in AN, while the second row shows the results for the traveling data collected AS. In none of the cases the mean value of the difference significantly differs from 0 on the basis of a 1-sample t-test, indicating that there are no systematic biases.
      Figure thumbnail gr6
      Fig. 6Bland-Altman plots for the reproducibility assessment of the measurements of subcortical volumes across different vendors (V1, V2, V3), considering the traveling brain data collected in AN (first row, 4 subjects) and AS (second row, 3 subjects). To simplify reading, the homologous structures in left and right hemispheres were represented with the same color.
      Figure thumbnail gr7
      Fig. 7Bland-Altman plots for the reproducibility assessment of the measurements of cortical thickness of brain parcels across different vendors (V1, V2, V3), considering the traveling brain data collected in AN (first row, 4 subjects) and AS (second row, 3 subjects). To simplify reading, the homologous structures in left and right hemispheres were represented with the same color.
      The mean value of percentage of variations in measuring the volumes of deep structures varies from 0.2% to 1.3%, while the standard deviation ranges in both data sets from about 5% to 8%. The highest values of variation, associated with a lower reproducibility level, are related to nuclei accumbens, which are small and challenging structures to segment, as already stated (purple dots in Fig. 5).
      Analogously, for the measurement of cortical thicknesses, the mean value of the percentage of variation goes from 0.2% to 3.1%, with the standard deviation ranging from about 4% to 7%. In these structures, the highest variations in the AN traveling data set seem to be related to cortical regions with high thickness values such as some temporal regions (temporal pole, the transverse temporal cortex), the insula, and the entorhinal cortex. However, the same trend does not show in the AS traveling subject data set, where the values outside the 95% distribution boundaries are more spread across the entire range of thickness values.
      In the Supplementary material the uncorrected and FDR corrected p values of paired T-Test for the repeated subcortical volume and cortical thickness measurements with different vendors are reported for AN and AS subjects.
      For the subcortical regions, no significant differences between each couple of vendors were obtained in AN, while only the measurement of the volume of thalamus resulted statistically significant different between V1 and V2 of AS.
      For the cortical regions, significant differences were found between V1 and V2 of AN (inferior temporal gyrus, lateral orbito-frontal gyrus, post-central gyrus, superior parietal gyrus) and of As (inferior temporal gyrus). Analogously, V1 and V3 show significant differences both in AN (inferior temporal gyrus, post-central gyrus) and in AS (inferior temporal gyrus). No statistically significant differences were obtained between V2 and V3 both in AN and AS.

      5. Discussion

      The work of the RIN – Neuroimaging Network started to address some of the urgent challenges for a full exploitation of MRI biomarkers for the diagnosis and prognosis in neurology field [
      • Jovicich J.
      • Barkhof F.
      • Babiloni C.
      • Herholz K.
      • Mulert C.
      • Berckel B.N.M.
      • et al.
      Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
      ]. In particular, the project developed Standard Operating Procedures for the acquisition of T1-weighted MRI of the brain, adapted to multivendor scenarios and suitable for the equipment available in hospitals. On this basis, in this study a set of reference values for different anatomical cerebral structures was extracted from a population of young healthy subjects and the residual variability after the harmonization of the acquisition protocol was assessed in cortical and subcortical regions, by segmenting the images with one of the most widespread techniques (the FreeSurfer utility). We observed a residual intra-site minimum variation that ranges from about 2% to 10% and a maximum intra-site variation that ranges from 16% to 28%. The inter-site variability calculated on the entire multicentric dataset ranges from about 11% to 19% whilst the inter-vendor variability calculated on the entire dataset ranges from 5% to 18%. As expected, the volume variability changes considerably, depending on the intrinsic characteristics of the segmented subcortical structures, with some distributions across sites that are statistically different, even after total intracranial volume normalization, in specific structures such as hippocampus, amygdala, globus pallidus and nucleus accumbens. The same applies to cortical thickness measurements for which the percentage variation is lower since the intra-site minimum variation ranges from 0.3% to 5% and the maximum ranges from about 5% to 16%. The inter-site variability of brain cortex thickness calculated on the entire dataset ranges from 3% to 10% similarly to the inter-vendor variability which ranges from 3% to 12%. The thickness distributions of cortical parcels are statistically different across sites in about 30% of regions, equally distributed among hemispheres (11 ROIs in left and 10 ROIs in right hemisphere).
      Multicentric studies, targeted to specific brain region alterations in terms of volume or thickness, usually plan the multicentric acquisition settings to minimize the variability due to acquisition parameters. We observed in this study that, even after an MRI definition of Standard Operating Procedures which minimizes the variability in the acquisition parameters, a complete image harmonization is not achieved. A residual not negligible variability is present due to the test–retest variability combined with the variations in T1-weighted images induced by the input parameters specific to each vendor. Thus, the expected pathological effect (e.g. the amount of cortical thinning in a specific region of interest or the volume enlargement in a deep structure due to pathophysiological mechanisms) in such studies must be compared to this residual variability in order to estimate appropriate sample sizes (both at global as well as at intra-site level).
      Quality control measures analyses, indeed, confirmed that the T1-weighted MRI images of the brain are still strongly dependent on the vendor in terms of contrast to noise and signal to noise in different brain tissues even after the definition of Standard Operating Procedures for brain MRI acquisition, in part also observable in Fig. 2. On the other hand, the same analyses ruled out the possibility that systematic signal alterations with a significant impact on the brain structures measurements were due to the number of channels of the head coils or to a specific scanner model of the same vendor (Fig. 5).
      However, it is important to point out that differences in quality control metrics distributions could be generated also by the not perfect harmonization of T1-weighted sequences across vendors. An ideal match of different sequences with the same weighting, but from different vendors, would have required to change several variables, often not accessible to the radiographer. On the contrary, the SOPs were developed to help the operator to set the protocol on a commercial scanner, equipped with common sequences, changing simple parameters.
      Even if beneficial, the definition of SOPs does not guarantee the similarity in quantitative volumetric measures. The CNR between gray and white matter seems to be the main driving feature for the automated gray matter structures segmentation. Indeed, even though the intensity range is visually well matched for vendor 1 and 2 (Fig. 2), the CNR is different (as shown in Fig. 4, Fig. 5) and some discrepancies appear in intra-subject measurements (Fig. 6, Fig. 7). Conversely, when the difference in CNR is smaller (vendor 1 and 3) despite a remarkable visual difference (Fig. 2), there is a better similarity in gray matter measures on the same subjects.
      The residual impact of the scanner vendor on the brain measurements was detectable with a very simple machine learning experiment on vendor prediction which obtained accuracy values not compatible with the chance level. This is in line with previous studies [
      • Ferrari E.
      • Bosco P.
      • Calderoni S.
      • Oliva P.
      • Palumbo L.
      • Spera G.
      • et al.
      Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study.
      ] which demonstrated the impact of a not well-designed training set in causing sample and site dependent classifiers, originally thought for the detection of novel anatomical biomarkers of pathologies, which can show significantly positive performances due to underlying and not controlled capability in site classification. For these reasons, particular care should be taken in designing machine learning experiments on multivariate T1-weighted MRI derived measurements and in deep learning approaches which are even more sensitive to subtle intensity variations due to scanner properties even for images acquired with Standard Operating Procedures and well controlled protocols.
      Traveling subjects’ analyses showed a good agreement in both subcortical and cortical measurements obtained on the same subjects with scanners from different vendors. The residual variability in measuring the volumes of deep structures, calculated as the standard deviation of percentage variation in Bland-Altman plots, ranges on both data sets from about 5% to 8% depending on the considered structures. The highest values of variation, indicating lower levels of reproducibility, are related to nuclei accumbens, which are small and challenging structures to segment.
      Regarding the measurements of cortical thickness, the standard deviation values range in both data sets from about 4% to 7% where the highest variations in the AN traveling data set seem to be related to cortical regions with high thickness values such as the temporal regions (temporal pole, the transverse temporal cortex), the insula, and the entorhinal cortex.
      Except for the finding on the difference between V1 and V2 of AS in the thalamus, the specific areas that show statistically significant alterations are cortical regions: the inferior temporal gyrus, the lateralorbitofrontal gyrus, the postcentral gyrus and the superior parietal gyrus. The main contribution to the augmented variability in these regions may be related to the increased test–retest variability [

      Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.

      ].
      The order of magnitude of these intra-subject percent variations must be put in the context of normal aging or pathological alterations such as those related to neurodegenerative processes [
      • Pini L.
      • Pievani M.
      • Bocchetta M.
      • Altomare D.
      • Bosco P.
      • Cavedo E.
      • et al.
      Brain atrophy in Alzheimer’s Disease and aging.
      ]. For example, the pattern of atrophy due to aging is threefold milder in normal aging than Alzheimer’s Disease (AD) (5 vs. 18% in the medial temporal lobe [
      • Bakkour A.
      • Morris J.C.
      • Dickerson B.C.
      The cortical signature of prodromal AD: regional thinning predicts mild AD dementia.
      ,

      Dickerson BC, Bakkour A, Salat DH, Feczko E, Pacheco J, Greve DN, et al. The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb Cortex 2009;19:497–510. doi:10.1093/cercor/bhn113.

      ]) and the annual rate of atrophy in these areas is significantly less pronounced (0.5 vs. 3% in aging vs. AD [
      • Fjell A.M.
      • Westlye L.T.
      • Amlien I.
      • Espeseth T.
      • Reinvang I.
      • Raz N.
      • et al.
      Minute effects of sex on the aging brain: a multisample magnetic resonance imaging study of healthy aging and Alzheimer’s disease.
      ]).
      Longitudinal studies on specific anatomical MRI biomarkers of the brain should then be designed according to these variability values and to the expected pathological effect in order to determine the correct experimental sample sizes.
      In general, even though quality control measures remain strongly dependent on the scanner vendor even after the definition of the acquisition protocol, the agreement on the traveling brain anatomical measurements suggests that a good reproducibility can be achieved at inter-site level on the same subjects, with an overall variability (5–8%). This intra-subject variability, which is mainly due to the residual differences after image protocol definition, contributes to the measured intra-vendor (5–18%) and mean intra-site variability (9–17%) that were evaluated on different subjects and thus impacted by the inter-subject variability component too. However, the capability of a simple SVM classifier to identify the scanner vendor with an accuracy well above the chance level underlines the risk that multivariate approaches can be particularly sensitive to subtle image intensities changes that can be reflected in high-level anatomical measures.

      6. Limitations

      The global variability that we assessed in the multicentric experiment has many different sources: the test–retest variability [

      Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.

      ,

      Melzer TR, Keenan RJ, Leeper GJ, Kingston-Smith S, Felton SA, Green SK, et al. Test-retest reliability and sample size estimates after MRI scanner relocation. Neuroimage 2020;211:116608. doi: 10.1016/J.NEUROIMAGE.2020.116608.

      ,
      • Maclaren J.
      • Han Z.
      • Vos S.B.
      • Fischbein N.
      • Bammer R.
      Reliability of brain volume measurements: a test-retest dataset.
      ], the inter-vendor variability, the inter-site variability, the inter-subject variability. By disaggregating the data by vendor, the intra-vendor variability was assessed in order to check whether systematic biases could be observed; by performing the traveling brain experiment the inter-vendor/inter-site variability was assessed. However, in all these cases we did not assess the test–retest variability which intrinsically contributes. Since the aim of our study was to assess the variability and reproducibility of morphometric measures derived from T1w images across different sites in a clinical setting after providing some Standard Operating Procedures (which is one of the most common scenarios in clinical research) we must be aware that the test–retest variability will always contribute to the global variability.
      As discussed above, another limitation of this study is the not ideal harmonization of T1w sequences across vendors. The Standard Operating Procedures were defined by looking for a compromise between protocol matching and image acquisition in a clinical environment, imposing a uniform spatial resolution and similar time of acquisition. To minimize the variability across vendors, a better harmonization of parameters should be carried out, even if this could request the modification of advanced variables of sequences, not easily feasible in clinical setting.
      As described in the method section, the study was designed by using one segmentation algorithm only. The choice was due to its large diffusion in usage and to its very well-known characterization in many contexts. Different segmentation approaches could in principle have different impacts on the evaluation of the variability of anatomical measurements at both subcortical and cortical levels [
      • Palumbo L.
      • Bosco P.
      • Fantacci M.E.
      • Ferrari E.
      • Oliva P.
      • Spera G.
      • et al.
      Evaluation of the intra- and inter-method agreement of brain MRI segmentation software packages: a comparison between SPM12 and FreeSurfer v6.0.
      ], being less or more prone to subtle signal variations in different brain areas. The results for both intra- and inter-site variability could be affected by the small numerosity of subjects collected in each site (mean and standard deviation of 5.5 ± 1.1 subjects per site), which can produce an overestimation of such variability. The numerosity of the two traveling brain experiments is also limited. Further studies should increment the intra-site sampling in order to reach a more robust statistical evaluation along with bigger traveling brain settings in different sites.

      7. Conclusions

      The work of the RIN – Neuroimaging Network allowed the acquisition in a multicentric framework of a normative dataset of cerebral T1-weighted MRI of young healthy subjects, by using Standard Operating Procedures. The analyses of the MRI derived measurements allowed the extraction of normative anatomical reference values together with their variability. The acquisitions with the same protocol on a dataset of traveling subjects allowed to disentangle the contribution of subject anatomical variability and the vendor impact. Although a good agreement was shown, the impact of the acquisition scanner on the MRI-derived anatomical measures is still not negligible and detectable through simple data mining approaches, particularly through multivariate classifiers.

      8. The RIN Neuroimaging Network

      Maria Grazia Bruzzone (Fondazione IRCCS Istituto Neurologico Carlo Besta), Claudia A. M. Gandini Wheeler-Kingshott (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, UCL Queen Square Institute of Neurology, University of Pavia), Michela Tosetti (Fondazione IRCCS Stella Maris), Alberto Redolfi (IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli), Egidio D'Angelo (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, University of Pavia), Gianluigi Forloni (Istituto di Ricerche Farmacologiche Mario Negri IRCCS), Raffaele Agati (IRCCS Istituto delle Scienze Neurologiche di Bologna), Marco Aiello (IRCCS SDN Istituto di Ricerca), Elisa Alberici (IRCCS Istituti Clinici Scientifici Maugeri), Carmelo Amato (Oasi Research Institute-IRCCS), Domenico Aquino (Fondazione IRCCS Istituto Neurologico Carlo Besta), Filippo Arrigoni (Istituto Scientifico, IRCCS E. Medea), Francesca Baglio (IRCCS Fondazione don Carlo Gnocchi onlus), Stefano Bastianello (Fondazione IRCSS Istituto Neurologico Naz.le Mondino), Laura Biagi (Fondazione IRCCS Stella Maris), Lilla Bonanno (IRCCS Centro Neurolesi Bonino Pulejo), Paolo Bosco (Fondazione IRCCS Stella Maris), Francesca Bottino (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Marco Bozzali (Fondazione IRCCS Santa Lucia), Chiara Carducci (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Irene Carne (IRCCS Istituti Clinici Scientifici Maugeri), Lorenzo Carnevale (IRCCS Neuromed), Antonella Castellano (IRCCS Ospedale San Raffaele), Carlo Cavaliere (IRCCS SDN Istituto di Ricerca), Mattia Colnaghi (Istituto Auxologico Italiano IRCCS), Giorgio Conte (Fondazione IRCCS Cà Granda Osp. Maggiore Policlinico), Mauro Costagli (University of Genova; Fondazione IRCCS Stella Maris), Silvia De Francesco (IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli), Greta Demichelis (Fondazione IRCCS Istituto Neurologico Carlo Besta), Valeria Elisa Contarino (Fondazione IRCCS Ca Granda Osp. Maggiore Policlinico), Andrea Falini (IRCCS Ospedale San Raffaele), Stefania Ferraro (Fondazione IRCCS Istituto Neurologico Carlo Besta), Giulio Ferrazzi (IRCCS Ospedale San Camillo), Lorenzo Figà Talamanca (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Cira Fundarò (IRCCS Istituti Clinici Scientifici Maugeri), Simona Gaudino (IRCCS Fondazione Policlinico Universitario Agostino Gemelli), Francesco Ghielmetti (Fondazione IRCCS Istituto Neurologico Carlo Besta), Ruben Gianeri (Fondazione IRCCS Istituto Neurologico Carlo Besta), Giovanni Giulietti (Fondazione IRCCS Santa Lucia), Marco Grimaldi (IRCCS Istituto Clinico Humanitas), Antonella Iadanza (IRCCS Ospedale San Raffaele), Marta Lancione (Fondazione IRCCS Stella Maris), Fabrizio Levrero (IRCCS Ospedale Policlinico San Martino). Raffaele Lodi (IRCCS Istituto delle Scienze Neurologiche di Bologna), Daniela Longo (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Giulia Lucignani (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Martina Lucignani (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Maria Luisa Malosio (IRCCS Istituto Clinico Humanitas), Vittorio Manzo (Istituto Auxologico Italiano, IRCCS), M. Marcella Laganà (IRCCS Fondazione don Carlo Gnocchi onlus), Silvia Marino (IRCCS Centro Neurolesi Bonino Pulejo), Jean Paul Medina (Fondazione IRCCS Istituto Neurologico Carlo Besta), Edoardo Micotti (Istituto di Ricerche Farmacologiche Mario Negri IRCCS), Claudia Morelli (Istituto Auxologico Italiano IRCCS), Alessio Moscato (IRCCS Istituti Clinici Scientifici Maugeri), Antonio Napolitano (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Anna Nigri (Fondazione IRCCS Istituto Neurologico Carlo Besta), Francesco Padelli (Fondazione IRCCS Istituto Neurologico Carlo Besta), Sara Palermo (Fondazione IRCCS Istituto Neurologico Carlo Besta), Fulvia Palesi (Fondazione IRCSS Istituto Neurologico Naz.le Mondino, University of Pavia), Patrizia Pantano (RCCS Neuromed), Chiara Parrillo (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Luigi Pavone (IRCCS Neuromed), Denis Peruzzo (Istituto Scientifico, IRCCS E. Medea), Nikolaos Petsas (IRCCS Neuromed), Alice Pirastru (IRCCS Fondazione don Carlo Gnocchi onlus), Letterio S. Politi (IRCCS Istituto Clinico Humanitas), Luca Roccatagliata (IRCCS Ospedale Policlinico San Martino), Elisa Rognone (Fondazione IRCSS Istituto Neurologico Naz.le Mondino), Andrea Rossi (Ospedale Pediatrico Istituto Giannina Gaslini, Università di Genova), Maria Camilla Rossi-Espagnet (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Claudia Ruvolo (IRCCS Centro Neurolesi Bonino Pulejo), Marco Salvatore (IRCCS SDN Istituto di Ricerca), Giovanni Savini (IRCCS Istituto Clinico Humanitas), Fabrizio Tagliavini (Fondazione IRCCS Istituto Neurologico Carlo Besta), Emanuela Tagliente (IRCCS Istituto Ospedale Pediatrico Bambino Gesù), Claudia Testa (IRCCS Istituto delle Scienze Neurologiche di Bologna), Caterina Tonon (IRCCS Istituto delle Scienze Neurologiche di Bologna), Domenico Tortora (Ospedale Pediatrico Istituto Giannina Gaslini), Fabio Maria Triulzi (Fondazione IRCCS Cà Granda Osp. Maggiore Policlinico).

      Funding

      This study was funded by the Italian Minister of Health under the RC grant, the 5x1000 voluntary contributions to IRCCS Fondazione Stella Maris and under the following RIN projects: RRC-2016-2361095; RRC-2017-2364915; RRC-2018-2365796; RCR-2019-23669119_001 along with the contribution of the Ministry of Economy and Finance (CCR-2017-23669078).

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Appendix A. Supplementary data

      The following are the Supplementary data to this article:

      References

        • Pini L.
        • Pievani M.
        • Bocchetta M.
        • Altomare D.
        • Bosco P.
        • Cavedo E.
        • et al.
        Brain atrophy in Alzheimer’s Disease and aging.
        Ageing Res Rev. 2016; 30: 25-48
        • Bosco P.
        • Redolfi A.
        • Bocchetta M.
        • Ferrari C.
        • Mega A.
        • Galluzzi S.
        • et al.
        The impact of automated hippocampal volumetry on diagnostic confidence in patients with suspected Alzheimer’s disease: A European Alzheimer’s Disease Consortium study.
        Alzheimer’s Dement. 2017; 13: 1013-1023
        • Fennema-Notestine C.
        • Hagler D.J.
        • McEvoy L.K.
        • Fleisher A.S.
        • Wu E.H.
        • Karow D.S.
        • et al.
        Structural MRI biomarkers for preclinical and mild Alzheimer’s disease.
        Hum Brain Mapp. 2009; 30: 3238-3253
        • Rohrer J.D.
        Structural brain imaging in frontotemporal dementia.
        Biochim Biophys Acta. 2012; 1822: 325-332https://doi.org/10.1016/J.BBADIS.2011.07.014
        • Meyer S.
        • Mueller K.
        • Stuke K.
        • Bisenius S.
        • Diehl-Schmid J.
        • Jessen F.
        • et al.
        Predicting behavioral variant frontotemporal dementia with pattern classification in multi-center structural MRI data.
        NeuroImage Clin. 2017; 14: 656-662
        • Ibarretxe-Bilbao N.
        • Junque C.
        • Marti M.J.
        • Tolosa E.
        Brain structural MRI correlates of cognitive dysfunctions in Parkinson’s disease.
        J Neurol Sci. 2011; 310: 70-74https://doi.org/10.1016/J.JNS.2011.07.054
        • Sarasso E.
        • Agosta F.
        • Piramide N.
        • Filippi M.
        Progression of grey and white matter brain damage in Parkinson’s disease: a critical review of structural MRI literature.
        J Neurol. 2021; 268: 3144-3179https://doi.org/10.1007/S00415-020-09863-8
        • Whitwell J.L.
        • Weigand S.D.
        • Shiung M.M.
        • Boeve B.F.
        • Ferman T.J.
        • Smith G.E.
        • et al.
        Focal atrophy in dementia with Lewy bodies on MRI: a distinct pattern from Alzheimer’s disease.
        Brain A J Neurol. 2007; 130: 708-719
        • Hanyu H.
        • Shimizu S.
        • Tanaka Y.
        • Hirao K.
        • Iwamoto T.
        • Abe K.
        MR features of the substantia innominata and therapeutic implications in dementias.
        Neurobiol Aging. 2007; 28: 548-554https://doi.org/10.1016/J.NEUROBIOLAGING.2006.02.009
        • Wright I.C.
        • Rabe-Hesketh S.
        • Woodruff P.W.R.
        • David A.S.
        • Murray R.M.
        • Bullmore E.T.
        Meta-analysis of regional brain volumes in schizophrenia.
        Am J Psychiatry. 2000; 157: 16-25https://doi.org/10.1176/AJP.157.1.16
        • Lawrie S.M.
        • Abukmeil S.S.
        Brain abnormality in schizophrenia. A systematic and quantitative review of volumetric magnetic resonance imaging studies.
        Br J Psychiatry. 1998; 172: 110-120https://doi.org/10.1192/BJP.172.2.110
        • Andreescu C.
        • Butters M.A.
        • Begley A.
        • Rajji T.
        • Wu M.
        • Meltzer C.C.
        • et al.
        Gray matter changes in late life depression—a structural MRI analysis.
        Neuropsychopharmacology. 2008; 33: 2566-2572
        • Amico F.
        • Meisenzahl E.
        • Koutsouleris N.
        • Reiser M.
        • Möller H.J.
        • Frodl T.
        Structural MRI correlates for vulnerability and resilience to major depressive disorder.
        J Psychiatry Neurosci. 2011; 36: 15https://doi.org/10.1503/JPN.090186
        • Amaral D.G.
        • Schumann C.M.
        • Nordahl C.W.
        Neuroanatomy of autism.
        Trends Neurosci. 2008; 31: 137-145https://doi.org/10.1016/J.TINS.2007.12.005
        • Ecker C.
        The neuroanatomy of autism spectrum disorder: an overview of structural neuroimaging findings and their translatability to the clinical setting.
        Autism. 2017; 21: 18-28https://doi.org/10.1177/1362361315627136
        • Bosco P.
        • Giuliano A.
        • Delafield-Butt J.
        • Muratori F.
        • Calderoni S.
        • Retico A.
        Brainstem enlargement in preschool children with autism: Results from an intermethod agreement study of segmentation algorithms.
        Hum Brain Mapp. 2019; 40: 7-19https://doi.org/10.1002/hbm.24351
        • Preston J.L.
        • Molfese P.J.
        • Mencl W.E.
        • Frost S.J.
        • Hoeft F.
        • Fulbright R.K.
        • et al.
        Structural brain differences in school-age children with residual speech sound errors.
        Brain Lang. 2014; 128: 25-33
        • Kadis D.S.
        • Goshulak D.
        • Namasivayam A.
        • Pukonen M.
        • Kroll R.
        • De Nil L.F.
        • et al.
        Cortical thickness in children receiving intensive therapy for idiopathic apraxia of speech.
        Brain Topogr. 2014; 27: 240-247
        • Conti E.
        • Retico A.
        • Palumbo L.
        • Spera G.
        • Bosco P.
        • Biagi L.
        • et al.
        Autism spectrum disorder and childhood apraxia of speech: early language-related hallmarks across structural MRI study.
        J Pers Med. 2020; 10: 275
        • Jovicich J.
        • Barkhof F.
        • Babiloni C.
        • Herholz K.
        • Mulert C.
        • Berckel B.N.M.
        • et al.
        Harmonization of neuroimaging biomarkers for neurodegenerative diseases: A survey in the imaging community of perceived barriers and suggested actions.
        Alzheimer’s Dement (Amsterdam, Netherlands). 2019; 11: 69-73
        • Ferrari E.
        • Bosco P.
        • Calderoni S.
        • Oliva P.
        • Palumbo L.
        • Spera G.
        • et al.
        Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study.
        Artif Intell Med. 2020; 108: 101926
        • Jack C.R.
        • Bernstein M.A.
        • Fox N.C.
        • Thompson P.
        • Alexander G.
        • Harvey D.
        • et al.
        The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods.
        J Magn Reson Imaging. 2008; 27: 685-691
        • Thompson P.M.
        • Stein J.L.
        • Medland S.E.
        • Hibar D.P.
        • Vasquez A.A.
        • Renteria M.E.
        • et al.
        The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data.
        Brain Imaging Behav. 2014; 8: 153-182
        • Nigri A.
        • Ferraro S.
        • Gandini Wheeler-Kingshott C.A.M.
        • Tosetti M.
        • Redolfi A.
        • Forloni G.
        • et al.
        Quantitative MRI harmonization to maximize clinical impact: the RIN–neuroimaging network.
        Front Neurol. 2022; 13https://doi.org/10.3389/fneur.2022.855125
        • Lancione M.
        • Bosco P.
        • Costagli M.
        • Nigri A.
        • Aquino D.
        • Carne I.
        • et al.
        Multi-centre and multi-vendor reproducibility of a standardized protocol for quantitative susceptibility Mapping of the human brain at 3T.
        Phys Med. 2022; 103: 37-45
        • Fischl B.
        • Salat D.H.
        • Busa E.
        • Albert M.
        • Dieterich M.
        • Haselgrove C.
        • et al.
        Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain.
        Neuron. 2002; 33: 341-355
        • Fischl B.
        FreeSurfer.
        FreeSurfer Neuroimage. 2012; 62: 774-781https://doi.org/10.1016/j.neuroimage.2012.01.021
        • Desikan R.S.
        • Ségonne F.
        • Fischl B.
        • Quinn B.T.
        • Dickerson B.C.
        • Blacker D.
        • et al.
        An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.
        Neuroimage. 2006; 31: 968-980
        • Whitwell J.L.
        • Crum W.R.
        • Watt H.C.
        • Fox N.C.
        Normalization of cerebral volumes by use of intracranial volume: Implications for longitudinal quantitative mr imaging.
        Am J Neuroradiol. 2001; 22: 1483-1489
        • Esteban O.
        • Birman D.
        • Schaer M.
        • Koyejo O.O.
        • Poldrack R.A.
        • Gorgolewski K.J.
        • et al.
        MRIQC: advancing the automatic prediction of image quality in MRI from unseen sites.
        PLoS One. 2017; 12: e0184661
        • Magnotta V.A.
        • Friedman L.
        Measurement of signal-to-noise and contrast-to-noise in the fBIRN multicenter imaging study.
        J Digit Imaging. 2006; 19: 140-147
        • Atkinson D.
        • Hill D.L.G.
        • Stoyle P.N.R.
        • Summers P.E.
        • Keevil S.F.
        Automatic correction of motion artifacts in magnetic resonance images using an entropy focus criterion.
        IEEE Trans Med Imaging. 1997; 16: 903-910https://doi.org/10.1109/42.650886
        • Gronenschild E.H.B.M.
        • Habets P.
        • Jacobs H.I.L.
        • Mengelers R.
        • Rozendaal N.
        • van Os J.
        • et al.
        The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements.
        PLoS One. 2012; 7: e38234
        • Cortes C.
        • Vapnik V.
        Support-vector networks.
        Mach Learn. 1995; 20: 273-297https://doi.org/10.1007/BF00994018
      1. Knussmann GN, Anderson JS, Prigge MBD, Dean DC, Lange N, Bigler ED, et al. Test-retest reliability of FreeSurfer-derived volume, area and cortical thickness from MPRAGE and MP2RAGE brain MRI images. Neuroimage: Reports 2022;2:100086. doi: 10.1016/J.YNIRP.2022.100086.

        • Bakkour A.
        • Morris J.C.
        • Dickerson B.C.
        The cortical signature of prodromal AD: regional thinning predicts mild AD dementia.
        Neurology. 2009; 72: 1048-1055https://doi.org/10.1212/01.wnl.0000340981.97664.2f
      2. Dickerson BC, Bakkour A, Salat DH, Feczko E, Pacheco J, Greve DN, et al. The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb Cortex 2009;19:497–510. doi:10.1093/cercor/bhn113.

        • Fjell A.M.
        • Westlye L.T.
        • Amlien I.
        • Espeseth T.
        • Reinvang I.
        • Raz N.
        • et al.
        Minute effects of sex on the aging brain: a multisample magnetic resonance imaging study of healthy aging and Alzheimer’s disease.
        J Neurosci Off J Soc Neurosci. 2009; 29: 8774-8783
      3. Melzer TR, Keenan RJ, Leeper GJ, Kingston-Smith S, Felton SA, Green SK, et al. Test-retest reliability and sample size estimates after MRI scanner relocation. Neuroimage 2020;211:116608. doi: 10.1016/J.NEUROIMAGE.2020.116608.

        • Maclaren J.
        • Han Z.
        • Vos S.B.
        • Fischbein N.
        • Bammer R.
        Reliability of brain volume measurements: a test-retest dataset.
        Sci Data. 2014; : 1https://doi.org/10.1038/SDATA.2014.37
        • Palumbo L.
        • Bosco P.
        • Fantacci M.E.
        • Ferrari E.
        • Oliva P.
        • Spera G.
        • et al.
        Evaluation of the intra- and inter-method agreement of brain MRI segmentation software packages: a comparison between SPM12 and FreeSurfer v6.0.
        Phys Medica. 2019; 64: 261-272