Technical note| Volume 64, P261-272, August 2019

Evaluation of the intra- and inter-method agreement of brain MRI segmentation software packages: A comparison between SPM12 and FreeSurfer v6.0

Published:August 05, 2019DOI:


      • Different segmentation pipelines may provide inconsistent quantification of brain structures.
      • The intra- and inter-method agreement between two popular segmentation software packages SPM12 and FreeSurfer v6.0.
      • SPM provides more consistent results both in the intra- and the inter-method agreement evaluation.
      • There are consistent biases in the estimates of gray matter and white matter between SPM and FreeSurfer.
      • To cross-validate the findings of each study against different segmentation methods before interpreting of the results.



      The lack of inter-method agreement can produce inconsistent results in neuroimaging studies. We evaluated the intra-method repeatability and the inter-method reproducibility of two widely-used automatic segmentation methods for brain MRI: the FreeSurfer (FS) and the Statistical Parametric Mapping (SPM) software packages.


      We segmented the gray matter (GM), the white matter (WM) and subcortical structures in test-retest MRI data of healthy volunteers from Kirby-21 and OASIS datasets. We used Pearson’s correlation (r), Bland-Altman plot and Dice index to study intra-method repeatability and inter-method reproducibility. In order to test whether different processing methods affect the results of a neuroimaging-based group study, we carried out a statistical comparison between male and female volume measures.


      A high correlation was found between test-retest volume measures for both SPM (r in the 0.98–0.99 range) and FS (r in the 0.95–0.99 range). A non-null bias between test-retest FS volumes was detected for GM and WM in the OASIS dataset. The inter-method reproducibility analysis measured volume correlation values in the 0.72–0.98 range and the overlap between the segmented structures assessed by the Dice index was in the 0.76–0.83 range. SPM systematically provided significantly greater GM volumes and lower WM and subcortical volumes with respect to FS. In the male vs. female brain volume comparisons, inconsistencies arose for the OASIS dataset, where the gender-related differences appear subtler with respect to the Kirby dataset.


      The inter-method reproducibility should be evaluated before interpreting the results of neuroimaging studies.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Physica Medica: European Journal of Medical Physics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Hogan R.E.
        • Mark K.E.
        • Choudhuri I.
        • Wang L.
        • Joshi S.
        • Miller M.I.
        • et al.
        Magnetic resonance imaging deformation-based segmentation and temporal lobe epilepsy.
        J Digit Imaging. 2000; 13: 217-218
        • Sachdeva J.
        • Kumar V.
        • Gupta I.
        • Khandelwal N.
        • Ahuja C.K.
        Segmentation, feature extraction, and multiclass brain tumor classification.
        J Digit Imaging. 2013; 26: 1141-1150
        • Akhil M.
        • Aishwarya R.
        • Lal V.
        • Mahesh S.
        Comparison and evaluation of segmentation techniques for brain mri using Gold Standard. Indian.
        J Sci Technol. 2016; 9
        • Maclaren J.
        • Han Z.
        • Vos S.B.
        • Fischbein N.
        • Bammer R.
        Reliability of brain volume measurements: a test-retest dataset.
        Sci Data. 2014; 1: 1-9
        • Chard D.T.
        • Parker G.J.M.
        • Griffin C.M.B.
        • Thompson A.J.
        • Miller D.H.
        The reproducibility and sensitivity of brain tissue volume measurements derived from an SPM-based segmentation methodology.
        J Magn Reson Imaging. 2002; 15: 259-267
        • Selgrade E.S.
        • Wagner H.R.
        • Huettel S.A.
        • Wang L.
        • McCarthy G.
        • Morey R.A.
        • et al.
        Scan-rescan reliability of subcortical brain volumes derived from automated segmentation.
        Hum Brain Mapp. 2010; 00: 1751-1762
        • Ochs A.L.
        • Ross D.E.
        • Zannoni M.D.
        • Abildskov T.J.
        • Bigler E.D.
        For the Alzheimer’s disease neuroimaging initiative. Comparison of automated brain volume measures obtained with neuroQuant® and FreeSurfer.
        J Neuroimaging. 2015; 25: 721-727
        • Katuwal G.J.
        • Baum S.A.
        • Cahill N.D.
        • Dougherty C.C.
        • Evans E.
        • Evans D.W.
        • et al.
        Inter-method discrepancies in brain volume estimation may drive inconsistent findings in autism.
        Front Neurosci. 2016;10.;
        • Wenger E.
        • Mårtensson J.
        • Noack H.
        • Bodammer N.C.
        • Kühn S.
        • Schaefer S.
        • Heinze H.J.
        • Düzel E.
        • Bäckman L.
        • Lindenberger U.L.M.
        Comparing manual and automatic segmentation of hippocampal volumes: reliability and validity issues in younger and older brains.
        Hum Brain Mapp. 2014; 2914: 4236-4248
        • Kazemi K.
        • Noorizadeh N.
        Quantitative comparison of SPM, FSL, and brainsuite for brain mr image segmentation.
        J Biomed Phys Eng. 2014; 4: 13-26
        • Perlaki G.
        • Orsi G.
        • Plozer E.
        • Altbacker A.
        • Darnai G.
        • Nagy S.A.
        • et al.
        Are there any gender differences in the hippocampus volume after head-size correction? A volumetric and voxel-based morphometric study.
        Neurosci Lett. 2014; 570: 119-123
        • Battaglini M.
        • Jenkinson M.
        • De Stefano N.
        SIENA-XL for improving the assessment of gray and white matter volume changes on brain MRI.
        Hum Brain Mapp. 2018;
        • Perlaki G.
        • Horvath R.
        • Nagy S.A.
        • Bogner P.
        • Doczi T.
        • Janszky J.
        • et al.
        Comparison of accuracy between FSL’s FIRST and Freesurfer for caudate nucleus and putamen segmentation.
        Sci Rep. 2017; 7: 1-9
        • Tae W.S.
        • Kim S.S.
        • Lee K.U.
        • Nam E.C.
        • Kim K.W.
        Validation of hippocampal volumes measured using a manual method and two automated methods (FreeSurfer and IBASPM) in chronic major depressive disorder.
        Neuroradiology. 2008; 50: 569-581
        • Jovicich Jorge
        • Czanner Silvester
        • Han Xiao
        • Salat David
        • van der Kouwe Andre
        • Quinn Brian
        • et al.
        MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths.
        Neuroimage. 2009; 46: 177-192
        • Barnes J.
        • Ridgway G.R.
        • Bartlett J.
        • Henley S.M.D.
        • Lehmann M.
        • Hobbs N.
        • et al.
        Head size, age and gender adjustment in MRI studies: a necessary nuisance?.
        Neuroimage. 2010; 53: 1244-1255
      1. Neuroimaging B members & collaborations of the WCFH. SPM, Statistical Parametric Mapping n.d. Available at:

        • Ashburner J.
        • Barnes G.
        • Chen C.
        • Daunizeau J.
        • Moran R.
        • Henson R.
        • et al.
        SPM12 manual the FIL methods group (and honorary members).
        Funct Imaging Lab. 2013; : 475-481
      2. Imaging L for CNAAMC for BBF. FreeSurfer n.d.

        • Fischl B.
        Neuroimage. 2012;
        • Landman B.A.
        • Huang A.J.
        • Gifford A.
        • Vikram D.S.
        • Lim I.A.L.
        • Farrell J.A.D.
        • et al.
        Multi-parametric neuroimaging reproducibility: a 3-T resource study.
        Neuroimage. 2011; 54: 2854-2866
        • Marcus D.S.
        • Wang T.H.
        • Parker J.
        • Csernansky J.G.
        • Morris J.C.
        • Buckner R.L.
        Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults.
        J Cogn Neurosci. 2007; 19: 1498-1507
      3. NITRC. NeuroImaging Tools & Resources Collaboratory n.d. Available at:

      4. NITRC. NeuroImaging Tools & Resources Collaboratory. n.d. Available at:

        • Klein A.
        • Tourville J.
        101 labeled brain images and a consistent human cortical labeling protocol.
        Front Neurosci. 2012; 6: 1-12
        • Vaz S.
        • Falkmer T.
        • Passmore A.E.
        • Parsons R.
        • Andreou P.
        The case for using the repeatability coefficient when calculating test-retest reliability.
        PLoS ONE. 2013; 8: 1-7
        • Fischl B.
        • van Der Kouwe A.
        • Salat D.H.
        • Busa E.
        • Albert M.
        • Dieterich M.
        • et al.
        Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain.
        Neuron. 2002; 33: 341-355
        • Morey R.A.
        • Petty C.M.
        • Xu Y.
        • Pannu Hayes J.
        • Wagner H.R.
        • Lewis D.V.
        • et al.
        A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes.
        Neuroimage. 2009; 45: 855-866
        • Ashburner J.
        • Friston K.J.
        Unified segmentation.
        Neuroimage. 2005; 26: 839-851
        • Myles P.S.
        • Cui J.I.
        Using the Bland-Altman method to measure agreement with repeated measures.
        Br J Anaesth. 2007; 99: 309-311
        • Takahashi R.
        • Ishii K.
        • Kakigi T.
        • Yokoyama K.
        Gender and age differences in normal adult human brain: voxel-based morphometric study.
        Hum Brain Mapp. 2011; 32: 1050-1058
        • Ritchie S.J.
        • Cox S.R.
        • Shen X.
        • Lombardo M.V.
        • Reus L.M.
        • Alloza C.
        • et al.
        Sex differences in the adult human brain: evidence from 5216 UK biobank participants.
        Cereb Cortex. 2018; 28: 2959-2975
        • Ruigrok A.N.V.
        • Salimi-Khorshidi G.
        • Lai M.C.
        • Baron-Cohen S.
        • Lombardo M.V.
        • Tait R.J.
        • et al.
        A meta-analysis of sex differences in human brain structure.
        Neurosci Biobehav Rev. 2014; 39: 34-50
      5. Cohen J. Statistical power analysis for the behavioral sciences, second edition. 1988. doi:10.1234/12345678.

        • Taha A.A.
        • Hanbury A.
        Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool.
        BMC Med Imaging. 2015; 15
        • Tudorascu D.L.
        • Karim H.T.
        • Maronge J.M.
        • Alhilali L.
        • Fakhran S.
        • Aizenstein H.J.
        • et al.
        Reproducibility and bias in healthy brain segmentation: comparison of two popular neuroimaging platforms.
        Front Neurosci. 2016; 10: 1-8
        • Heinen R.
        • Bouvy W.H.
        • Mendrik A.M.
        • Viergever M.A.
        • Biessels G.J.
        • De Bresser J.
        Robustness of automated methods for brain volume measurements across different MRI field strengths.
        PLoS ONE. 2016;
        • Seiger R.
        • Ganger S.
        • Kranz G.S.
        • Hahn A.
        • Lanzenberger R.
        Cortical thickness estimations of FreeSurfer and the CAT12 toolbox in patients with Alzheimer’s disease and healthy controls.
        J Neuroimaging. 2018; 28: 515-523
        • Collins D.L.
        • Neelin P.
        • Peters T.M.
        • Evans A.C.
        Automatic 3d intersubject registration of mr volumetric data in standardized talairach space.
        J Comput Assist Tomogr. 1994;
        • Wachinger C.
        • Reuter M.
        • Klein T.
        DeepNAT: deep convolutional neural network for segmenting neuroanatomy.
        Neuroimage. 2018;
        • Chen H.
        • Dou Q.
        • Yu L.
        • Qin J.
        • Heng P.-A.
        VoxResNet: deep voxelwise residual networks for brain segmentation from 3D MR images.
        Neuroimage. 2018;

      CHORUS Manuscript

      View Open Manuscript