Advertisement

Pre-trial quality assurance of diffusion-weighted MRI for radiomic analysis and the role of harmonisation

Open AccessPublished:October 27, 2022DOI:https://doi.org/10.1016/j.ejmp.2022.10.009

      Highlights

      • Mean ADC values are robust across different scanners with imaging protocol standardisation.
      • Radiomic features of ADC map in a homogeneous phantom show substantial variation between MRI systems, even after ComBat harmonisation.
      • Resampling to a smaller voxel size provides more repeatable and reproducible radiomic features than resampling to a larger voxel size.

      Abstract

      Purpose

      The aim of this study was to perform a quantitative quality assurance of diffusion-weighted MRI to assess the variability of the mean apparent diffusion coefficient (ADC) and other radiomic features across the scanners involved in the REGINA trial.

      Materials and methods

      The NIST/QIBA diffusion phantom was acquired on six 3 T scanners from five centres with a rectum-specific diffusion protocol. All sequences were repeated in each scan session without moving the phantom from the table. Linear interpolation to two isotropic voxel spacing (0.9 and 4 mm) was performed as well as the ComBat feature harmonisation method between scanners. The absolute accuracy error was evaluated for the mean ADC. Repeatability and reproducibility within-subject coefficients of variation (wCV) were computed for 142 radiomic features.

      Results

      For the mean ADC, accuracy error ranged between 0.1 % and 8.5 %, repeatability was <1 % and reproducibility was <3 % for diffusivity range between 0.4 and 1.1x10-3mm2/s. For the other radiomic features, wCV was below 10 % for 24 % and 15 % features for repeatability with resampling 0.9 mm and 4 mm, respectively, and 13 % and 11 % feature for reproducibility. ComBat method could improve significantly the wCV compared to reproducibility without ComBat (p-value < 0.001) but variation was still high for most of the features.

      Conclusion

      Our study provided the first investigation of feature selection for development of robust predictive models in the REGINA trial, demonstrating the added value of such a quality assurance process to select conventional and radiomic features in prospective multicentre trials.

      Keywords

      Introduction

      Diffusion weighted imaging (DWI) in magnetic resonance imaging (MRI) assesses the random Brownian motion of water molecules noninvasively. It is widely used for diagnostics due to its unique soft tissue contrast [
      • Drake-Pérez M.
      • Boto J.
      • Fitsiori A.
      • Lovblad K.
      • Vargas M.I.
      Clinical applications of diffusion weighted imaging in neuroradiology.
      ,
      • Weinreb J.C.
      • Barentsz J.O.
      • Choyke P.L.
      • Cornud F.
      • Haider M.A.
      • Macura K.J.
      • et al.
      PI-RADS prostate imaging – reporting and data system: 2015, Version 2.
      ] that is reflecting the tissue microstructure [
      • Fornasa F.
      Diffusion-weighted magnetic resonance imaging: what makes water run fast or slow?.
      ]. The apparent diffusion coefficient (ADC) provides a quantitative measure of DWI that is derived from monoexponential modelling of the signal intensity as a function of b-value [
      • Koh D.-M.
      • Collins D.J.
      Diffusion-weighted MRI in the body: applications and challenges in oncology.
      ]. ADC is considered a promising biomarker of therapy response [
      • Padhani A.R.
      • Liu G.
      • Mu-Koh D.
      • Chenevert T.L.
      • Thoeny H.C.
      • Takahara T.
      • et al.
      Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations.
      ], notably for systemic therapy, by offering indicators with the potential to predict treatment response earlier than conventional size parameters [
      • Mytsyk Y.
      • Pasichnyk S.
      • Dutka I.
      • Dats I.
      • Vorobets D.
      • Skrzypczyk M.
      • et al.
      Systemic treatment of the metastatic renal cell carcinoma: usefulness of the apparent diffusion coefficient of diffusion-weighted MRI in prediction of early therapeutic response.
      ,
      • Galbán C.J.
      • Hoff B.A.
      • Chenevert T.L.
      • Ross B.D.
      Diffusion MRI in early cancer therapeutic response assessment.
      ]. Combining conventional DWI metrics with radiomics – the extraction of supplementary phenotypic information from medical images [
      • Gillies R.J.
      • Kinahan P.E.
      • Hricak H.
      Radiomics: images are more than pictures, they are data.
      ] – is offering further opportunities [
      • Bickelhaupt S.
      • Jaeger P.F.
      • Laun F.B.
      • Lederer W.
      • Daniel H.
      • Kuder T.A.
      • et al.
      Radiomics based on adapted diffusion kurtosis imaging helps to clarify most mammographic findings suspicious for cancer.
      ,
      • Cusumano D.
      • Boldrini L.
      • Dhont J.
      • Fiorino C.
      • Green O.
      • Güngör G.
      • et al.
      Artificial Intelligence in magnetic Resonance guided Radiotherapy: medical and physical considerations on state of art and future perspectives.
      ].
      The results of DWI and radiomics are however variable across scanner platforms, sequences and data processing methods [
      • Jafar M.M.
      • Parsai A.
      • Miquel M.E.
      Diffusion-weighted magnetic resonance imaging in cancer: reported apparent diffusion coefficients, in-vitro and in-vivo reproducibility.
      ,
      • Yip S.S.F.
      • Aerts H.J.W.L.
      Applications and limitations of radiomics.
      ]. This lack of reproducibility limits the application of potential quantitative biomarkers from scientific research to decision-making in clinical practice [
      • O'Connor J.P.B.
      • Aboagye E.O.
      • Adams J.E.
      • Aerts H.J.W.L.
      • Barrington S.F.
      • Beer A.J.
      • et al.
      Imaging biomarker roadmap for cancer studies.
      ]. Standardisation of DWI acquisition techniques across multiple MRI platforms is thus warranted, especially for multicentre clinical trials to reduce bias. This is one of the goals of the Quantitative Imaging Biomarkers Alliance (QIBA), which provides technical performance standards for various imaging modalities, including DWI [
      • Shukla‐Dave A.
      • Obuchowski N.A.
      • Chenevert T.L.
      • Jambawalikar S.
      • Schwartz L.H.
      • Malyarenko D.
      • et al.
      Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials.
      ]. The National Institute of Standards and Technology (NIST), alongside QIBA, has developed a phantom which allows precise assessment of the ADC range that might be encountered in human tissue. The phantom can be used to standardize DWI acquisition schemes across multiple vendors.
      Acquiring a phantom, however, comes with an extra cost and is futile in the context of retrospective studies. To overcome this, the ComBat harmonisation method was introduced in radiomic analysis. This data-driven method that does not require the acquisition of phantom data is a post-processing step to remove the variability between feature distributions due to scanner and protocol effect, without altering the biological information [
      • Orlhac F.
      • Boughdad S.
      • Philippe C.
      • Stalla-Bourdillon H.
      • Nioche C.
      • Champion L.
      • et al.
      A postreconstruction harmonization method for multicenter radiomic studies in PET.
      ]. The approach has already been validated on different imaging modalities with phantoms and patient data [
      • Orlhac F.
      • Boughdad S.
      • Philippe C.
      • Stalla-Bourdillon H.
      • Nioche C.
      • Champion L.
      • et al.
      A postreconstruction harmonization method for multicenter radiomic studies in PET.
      ,
      • Orlhac F.
      • Frouin F.
      • Nioche C.
      • Ayache N.
      • Buvat I.
      Validation of A method to compensate multicenter effects affecting CT radiomics.
      ,
      • Orlhac F.
      • Lecler A.
      • Savatovski J.
      • Goya-Outi J.
      • Nioche C.
      • Charbonneau F.
      • et al.
      How can we combat multicenter variability in MR radiomics? Validation of a correction procedure.
      ,
      • Saint Martin M.-J.
      • Orlhac F.
      • Akl P.
      • Khalid F.
      • Nioche C.
      • Buvat I.
      • et al.
      A radiomics pipeline dedicated to Breast MRI: validation on a multi-scanner phantom study.
      ,

      Ibrahim A, Refaee T, Leijenaar RTH, Primakov S, Hustinx R, Mottaghy FM, et al. The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLoS One 2021;16:e0251147. doi: 10.1371/journal.pone.0251147.

      ].
      The REGINA trial (NCT04503694) is a phase II trial of neoadjuvant regorafenib, in combination with nivolumab and short-course radiotherapy for intermediate-risk stage II-III rectal cancer [
      • Bregni G.
      • Vandeputte C.
      • Pretta A.
      • Senti C.
      • Trevisi E.
      • Acedo Reina E.
      • et al.
      Rationale and design of REGINA, a phase II trial of neoadjuvant regorafenib, nivolumab, and short-course radiotherapy in stage II and III rectal cancer.
      ]. One of the exploratory objectives of this trial aims is to explore potential predictive imaging biomarkers as DWI. The objective of this study was to perform a quality assurance of DWI acquisition using a physical imaging phantom in all centres participating in the REGINA trial, as is recommended by QIBA [
      • Shukla‐Dave A.
      • Obuchowski N.A.
      • Chenevert T.L.
      • Jambawalikar S.
      • Schwartz L.H.
      • Malyarenko D.
      • et al.
      Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials.
      ]. The repeatability and reproducibility of the mean ADC and other radiomic features were also assessed, with and without ComBat harmonisation between scanners.

      Materials and methods

      Study design

      This phantom study is composed by multiple steps. At first, mean ADC values of the NIST/QIBA diffusion phantom were compared with the ground truth (accuracy). Secondly, repeatability and reproducibility were assessed, not only for mean ADC but also for other radiomic features. Finally, the ComBat method for the harmonisation of these feature values was evaluated. The workflow of our phantom study is shown in Fig. 1. Phantom setup and acquisition were executed under the same conditions as for patients to guarantee a reasonable link between quality assurance and patient data.
      Figure thumbnail gr1
      Fig. 1Study workflow. The NIST/QIBA diffusion phantom was scanned three times on six different MRI systems based on a standardised protocol. Multiple processing steps were performed: creation of ADC maps, resampling to two voxel sizes, segmentation, feature extraction and ComBat harmonisation. Finally, accuracy of the mean ADC was assessed and repeatability and reproducibility, before and after ComBat, was evaluated with the within-subject coefficient of variation (wCV) for all radiomic features. Wilcoxon test was used to compare both resampling schemes and Friedman test to assess the harmonisation step.

      Phantom

      The phantom used in this study was the NIST/QIBA diffusion phantom (Model 128, Qalibre MD, Boulder, Colorado USA). It consists of 13 vials filled with different polymer concentrations (from 0 % to 50 %) in an aqueous solution, giving different ADC values [0.128–1.127 × 10-3 mm2/sec]. One vial with 0 % polymer is in the central position and the others are positioned on two circles around it (inner and outer circle), as shown on Fig. 2. The vials are surrounded by ice water to control the phantom’s temperature at 0 °C, as the diffusion process is thermally-driven.
      Figure thumbnail gr2
      Fig. 2NIST/QIBA diffusion phantom and its corresponding axial ADC map. The percentage in each vial represents the polymer concentration.
      A single phantom was used across the different institutions. It was positioned in a systematic way between scanners always by the same person, with uniform cylindrical phantoms on both sides to improve the filling factor and with foam pads for stabilisation, as shown in Fig. 3. Laser positioning was used to align the centre of the phantom to the isocentre of the magnet, with its central axis parallel to the z-axis of the magnet. Finally, the body-matrix coil was carefully fixed above the phantom.
      Figure thumbnail gr3
      Fig. 3Setup of the NIST/QIBA diffusion phantom.

      MRI

      Phantom measurements were performed on six MR systems across five medical centres. All MR systems had a 3 T field strength, but were from different manufacturers (Siemens, GE, Philips). Prior to the centre visit, a DWI data acquisition protocol (Supplementary Data 1, Table 1) designed for the REGINA trial was sent to each centre with the request to comply with the parameters within the limits of their system. Centres didn’t need a specific research environment to implement it, it remained within the condition of the clinical one. This protocol was introduced by each institution’s physicist or manufacturer’s technician.
      After the acquisition of a localiser to center the phantom on the images, diffusion data were obtained consisting of an echo-planar imaging (EPI) sequence with nine b-values. Details of each scanner acquisition protocol are given in Table 1. DWI was performed three times within each scanning session without moving the phantom between each acquisition to assess the short-term repeatability. On scanner B, measurements at three time points set (one week apart from each other) were acquired, to assess long-term repeatability.
      Table 1Diffusion-weighted imaging (DWI) magnetic resonance imaging protocol. Abbreviations: TR = Repetition Time; TE = Echo Time.
      Scanner AScanner BScanner CScanner DScanner EScanner F
      ManufacturerSiemensSiemensSiemensSiemensPhilipsGE
      ModelMagnetom VidaSkyraSkyraMagnetom VidaAchieva dStreamDiscovery
      Field strength (T)333333
      Number of coil channels181818303232
      Parallel imaging factor222222
      Percent sampling10010010010098.8100
      Pixel spacing (mm)1.488/1.4881.488/1.4881.488/1.4881.488/1.4881.477/1.4770.977/0.977
      Slice thickness (mm)4.533344
      Spacing between slices (mm)4.953.33.33.34.34.3
      Number of slices303030303120
      Acquisition time (min)109910913
      TR (msec)750067006700730046804500
      TE (msec)6274746563102.1
      b-values (sec/mm2) average(s):
      0333321
      50333321
      150333322
      250333323
      400333325
      600333327
      9003333210
      12003333212
      15003333216
      Bandwidth (Hz)165316551655165335911953.12

      Image processing and segmentation

      Each dataset was analysed by the coordinating centre, first qualitatively with a visual check for artefacts and then quantitatively. The ADC maps were calculated with an in-house developed software, with the nine b-values and assuming a mono-exponential signal decay. The noise was estimated using the image with the 0 sec/mm2 b-value (b0). The mean and standard deviation (SD) of the signal were calculated in the b0 image, excluding a four pixel margin at the periphery. The ADC calculation was rejected if the signal was less than (mean + 3SD)*NoiseFactor, with the NoiseFactor set to 1.2. We also considered the signal drop between b0 and b1500 images by excluding the calculation when the value was less than SD*5. Non-computed pixels were set to 0.
      The signal-to-noise ratio (SNR) was computed as decribed by the QIBA protocol [

      Quantitative Imaging Biomarkers Alliance. QIBA Profile : Diffusion-Weighted Magnetic Resonance Imaging (DWI) 2019.

      ] in MICE toolkit v2021.1.0 (NONPI Medical AB, Umea, Sweden) [
      • Nyholm T.
      • Berglund M.
      • Brynolfsson P.
      • Jonsson J.
      EP-1533: ICE-Studio – an interactive visual research tool for image analysis.
      ]. The maps were then spatially resampled to an isotropic voxel spacing to be rotationally invariant in the same software [

      Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative.

      ]. Two schemes were analysed in this work: upsampling (0.9x0.9x0.9 mm3) and downsampling (4x4x4 mm3) using linear interpolation. This step was performed to ensure an identical volume between scanners during the segmentation. These new images were then sent to MIM v7.1.5 (MIM Software Inc., Cleveland, OH) where 13 spherical volumes of interest (VOI) with a diameter of 15 mm were drawn in the centre of each vial at the central plane of the phantom. Finally, delineations were exported into a single DICOM RT Structure Set file per patient.

      Feature extraction

      RadiomiX Research Toolbox (Radiomics, Liège, Belgium) was used for radiomic processing. To extract the radiomic features, the DICOM data and the corresponding segmentation file (.RTStruct file) were imported into the software tool. Except for image discretisation, no image processing was performed inside the application (neither spatial resampling nor normalisation). Intensity discretisation was applied with a fixed bin size (FBS), set to the average standard deviation of the signal intensity in all VOI [
      • Orlhac F.
      • Lecler A.
      • Savatovski J.
      • Goya-Outi J.
      • Nioche C.
      • Charbonneau F.
      • et al.
      How can we combat multicenter variability in MR radiomics? Validation of a correction procedure.
      ,
      • Saint Martin M.-J.
      • Orlhac F.
      • Akl P.
      • Khalid F.
      • Nioche C.
      • Buvat I.
      • et al.
      A radiomics pipeline dedicated to Breast MRI: validation on a multi-scanner phantom study.
      ], resulting in an FBS of 10. Calculation settings were left at their default values (GLCM distance of 1).
      Investigated features were categorised by intensity (n = 46) and texture (n = 96). The shape category was not analysed because identical VOI were used for each image. The intensity category combines 2 local intensity features (LocInt), 19 intensity-based statistical features (Stats), and 25 intensity histogram features (IH). The texture category includes 26 features from grey level co-occurrence matrix (GLCM), 16 features from grey level run length matrix (GLRLM), 16 features from grey level size zone matrix (GLSZM), 16 features from grey level distance zone matrix (GLDZM), 5 features from neighbourhood grey tone difference matrix (NGTDM), and 17 features from neighbouring grey level dependence matrix (NGLDM).

      ComBat harmonisation

      ComBat is a realignment method that can be directly applied to radiomic features to correct for the scanner effect [
      • Orlhac F.
      • Boughdad S.
      • Philippe C.
      • Stalla-Bourdillon H.
      • Nioche C.
      • Champion L.
      • et al.
      A postreconstruction harmonization method for multicenter radiomic studies in PET.
      ]. It matches the statistical distributions of the feature values measured in VOI j for each scanner i by the equation [
      • Saint Martin M.-J.
      • Orlhac F.
      • Akl P.
      • Khalid F.
      • Nioche C.
      • Buvat I.
      • et al.
      A radiomics pipeline dedicated to Breast MRI: validation on a multi-scanner phantom study.
      ]:
      yij=α+γi+δiεij


      where α is the average value for feature yij; γi is the additive scanner effect on features; and δi is the multiplicative scanner effect affected by εij, an error term.
      ComBat uses a maximum likelihood approach to estimate α, γi and δi, denoted as α^, γi^ and δi^. The normalised value of feature yij is then obtained by:
      yijComBat=(yij-α^-γi^)δi^+α^


      It was computed in Python v3.7.7 using the library neurocombat v0.2.12, by Fortin et al [
      • Fortin J.-P.
      • Parker D.
      • Tunç B.
      • Watanabe T.
      • Elliott M.A.
      • Ruparel K.
      • et al.
      Harmonization of multi-site diffusion tensor imaging data.
      ,
      • Fortin J.-P.
      • Cullen N.
      • Sheline Y.I.
      • Taylor W.D.
      • Aselcioglu I.
      • Cook P.A.
      • et al.
      Harmonization of cortical thickness measurements across scanners and sites.
      ], using the non-parametric form of the model and without empirical Bayes assumption.

      Statistical analysis

      Statistical analysis was performed in Python v3.7.7.
      Accuracy error was calculated for each vial as:
      AccuracyError=|Measured-Reference|Reference×100%


      with the reference values specified by the phantom manufacturer and the measured values corresponding to the mean ADC taken after resampling on an isotropic voxel of 0.9x0.9x0.9 mm3.
      For short/long-term repeatability and reproducibility, the within-subject coefficient of variation (wCV) in percentage was calculated as specified in QIBA recommendations [
      • Shukla‐Dave A.
      • Obuchowski N.A.
      • Chenevert T.L.
      • Jambawalikar S.
      • Schwartz L.H.
      • Malyarenko D.
      • et al.
      Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials.
      ]. It quantifies the intra-subject variability between measurements in order to detect the minimum detectable difference when two longitudinal measurements are used (repeatability) or different imaging methods (reproducibility). When the wCV is small, minimal changes in the feature can be detected, whereas when the wCV is large, considerable variations in the feature are required before one can be confident that a real alteration has occurred. For the repeatability/reproducibility analysis using wCV, a threshold of 10 % was used to consider the features as good and suitable for patient analysis, the strictest in the literature [
      • Zhang J.
      • Qiu Q.
      • Duan J.
      • Gong G.
      • Jiang Q.
      • Sun G.
      • et al.
      Variability of radiomic features extracted from multi-b-value diffusion-weighted images in hepatocellular carcinoma.
      ,
      • Prabhu V.
      • Gillingham N.
      • Babb J.S.
      • Mali R.D.
      • Rusinek H.
      • Bruno M.T.
      • et al.
      Repeatability, robustness, and reproducibility of texture features on 3 Tesla liver MRI.
      ,
      • Carbonell G.
      • Kennedy P.
      • Bane O.
      • Kirmani A.
      • El Homsi M.
      • Stocker D.
      • et al.
      Precision of MRI radiomics features in the liver and hepatocellular carcinoma.
      ,
      • Mahmood U.
      • Apte A.
      • Kanan C.
      • Bates D.D.B.
      • Corrias G.
      • Manneli L.
      • et al.
      Quality control of radiomic features using 3D-printed CT phantoms.
      ]. Vials using 40 % and 50 % were excluded from these analyses.
      Additional statistical tests were calculated with the pingouin library v0.3.11 [
      • Vallat R.
      Pingouin: statistics in Python.
      ]. Differences in mean ADC between vial position in the phantom were tested with a Mann-Whitney U test. Differences in wCV between resampling and before/after ComBat were tested with a paired Wilcoxon's signed-rank test. Friedman test was used to assess the difference between centres for each feature, before and after ComBat realignment. P-values<0.05 were interpreted as statistically significant.

      Results

      Accuracy

      Two issues were revealed during the intial visit to two centres, one related to the sequence of scanner A, the other to the software of scanner E, which were resolved promptly. Scanner F images contained N/2 ghosting artefacts but it was deemed acceptable. The SNR results, as well as other QIBA DWI profile requirement measures, can be found in Supplementary Data 1, Table 2.
      After resolving these issues, the quality control was successful for all scanners. 87 % vials with a polymer concentration of 0–30 % had<5 % difference from the ground truth, and all of them were under 10 % difference (Fig. 4). The median accuracy error for all vials was 1.3 % for scanner A, 2.7 % for scanner B, 0.9 % for scanner C, 2.6 % for scanner D, 4 % for scanner E, and 1.8 % for scanner F. The difference of the mean ADC between the inner and outer circle of vials was not statistically significant.
      Figure thumbnail gr4
      Fig. 4Vials accuracy compared to manufacturer’s ground truth. Error bars represent the standard deviation of intensity value inside each volume of interest.

      Repeatability

      For the short-term repeatability, the mean ADC had a wCV between 0.1 % and 1 % depending on the scanner for both resampling schemes. The percentage of radiomic features that had good short-term repeatability (wCV ≤ 10 %) over all the six scanners was 24 % (15/46 intensity features and 19/96 texture features) for a 0.9 mm resampled voxel spacing and 15 % (10/46 intensity features and 12/96 texture features) for a 4 mm resampled voxel spacing, with a significant difference in wCV between resampling schemes (p-value < 0.001). Results for each scanner are shown in Fig. 5.
      Figure thumbnail gr5
      Fig. 5Short-term repeatability analysis for all the six scanners using the NIST/QIBA phantom. Abbreviations: wCV = within-subject Coefficient of Variation.
      For the long-term repeatability assessed on scanner B, the mean ADC had a wCV of 1.3 % for both resampling schemes. 22 % (10/46) intensity features were repeatable for both resampling schemes, and 19 % (18/96) and 13 % (12/96) for texture features with a resampled voxel spacing of 0.9 mm and 4 mm, respectively. Those results are slightly worse than the short-term repeatability results for scanner B. The difference between resampling on wCV was significant (p-value < 0.05).

      Reproducibility

      Mean ADC had a wCV of 2.7 % for the two resampling approaches. The percentage of radiomic features presenting good reproducibility (wCV ≤ 10 %) was 13 % (10/46 intensity features and 8/96 texture features) for a resampling 0.9 mm and 11 % (10/46 intensity features and 6/96 texture features) for a resampling 4 mm, with a significant difference of wCV between resampling (p-value < 0.01). Only five categories (LocInt, Stats, GLCM, GLRLM and GLDZM) had reproducible features. All features for which good reproducibility was observed also showed good long-term repeatability.

      ComBat

      Without harmonization, 18 out of 142p-values calculated when testing the difference between centres for each feature were greater than 0.05. After ComBat, this number increased to 142p-values greater than 0.05, indicating a scanner effect was present but could be resolved. The wCV improved significantly (p-value < 0.001) compared to the reproducibility without ComBat. However, the percentage of radiomic features presenting good reproducibility (wCV ≤ 10 %) was still low, with 15 % (10/46 intensity features and 12/96 texture features) for a 0.9 mm isotropic voxel spacing and 14 % (10 intensity and texture features) for a 4 mm isotropic voxel spacing, with no significant difference of wCV between resampling schemes (p-value = 0.23). A good realignment between centres could be observed qualitatively (Fig. 6). Variation greater than 10 %, however, still existed for most features, with only 4 texture features (for both resampling schemes) passing the 10 % threshold in addition to the features already reproducible without ComBat.
      Figure thumbnail gr6
      Fig. 6Realignment of (a) NGLDM_LDE and (b) GLCM_diffVar features using ComBat to remove the scanner effect. Abbreviation: wCV = within-subject Coefficient of Variation.
      All the wCV results for repeatability and reproducibility before and after ComBat can be found in the Supplementary Material 2.

      Discussion

      The first objective of this work was to perform a quality assurance using the NIST/QIBA diffusion phantom of DWI as part of the REGINA trial. The constraints of the REGINA protocol parameters were not always met by each institution. This is in line with prior evidence showing that an optimal protocol for one platform is not always feasible for all platforms [
      • Taouli B.
      • Beer A.J.
      • Chenevert T.
      • Collins D.
      • Lehman C.
      • Matos C.
      • et al.
      Diffusion-weighted imaging outside the brain: Consensus statement from an ISMRM-sponsored workshop.
      ]. Parameters like TE and TR were however less variable compared to site-specific protocols [
      • Jafar M.M.
      • Parsai A.
      • Miquel M.E.
      Diffusion-weighted magnetic resonance imaging in cancer: reported apparent diffusion coefficients, in-vitro and in-vivo reproducibility.
      ,
      • Chenevert T.L.
      • Galbán C.J.
      • Ivancevic M.K.
      • Rohrer S.E.
      • Londy F.J.
      • Kwee T.C.
      • et al.
      Diffusion coefficient measurement using a temperature-controlled fluid for quality control in multicenter studies.
      ]. Nevertheless, these differences may have an impact on the results, in particular for MRI radiomic feature values which depend on many parameters, including voxel size and SNR [
      • Roy S.
      • Whitehead T.D.
      • Quirk J.D.
      • Salter A.
      • Ademuyiwa F.O.
      • Li S.
      • et al.
      Optimal co-clinical radiomics: Sensitivity of radiomic features to tumour volume, image noise and resolution in co-clinical T1-weighted and T2-weighted magnetic resonance imaging.
      ,
      • Cattell R.
      • Chen S.
      • Huang C.
      Robustness of radiomic features in magnetic resonance imaging: review and a phantom study.
      ]. This led to the second objective of our study: to assess the variability of radiomic features between different scanners.
      The quality assurance highlighted different issues in two scanners. The mean ADC computed for scanner E had an accuracy error of 25 % for the central vial (0 % polymer) when it was computed with the accompanying manufacturer’s software. Nevertheless, when calculating the ADC map with our in-house software, a mean ADC in the correct range was observed. This encourages computing the ADC maps on a single software platform, especially in the context of a clinical trial. The second issue was due to numeric overflow that appeared in low b-value images (from 0 to 400 s/mm2) in scanner A for the phantom acquisition. It disappeared after changing the receiver image scaling from three to one, and the mean ADC in each vial was more consistent with the manufacturer’s values. This problem does not occur in patients, likely due to the higher concentration of protons, and thus signal, in the phantom. For this reason, the sequences were only adjusted for phantom acquisition.
      The accuracy and the short- and long-term repeatability and reproducibility of the mean ADC were good for the vials with 0 % to 30 % polymer concentration, comparable with results in literature [
      • Chenevert T.L.
      • Galbán C.J.
      • Ivancevic M.K.
      • Rohrer S.E.
      • Londy F.J.
      • Kwee T.C.
      • et al.
      Diffusion coefficient measurement using a temperature-controlled fluid for quality control in multicenter studies.
      ,
      • Kooreman E.S.
      • van Houdt P.J.
      • Nowee M.E.
      • van Pelt V.W.J.
      • Tijssen R.H.N.
      • Paulson E.S.
      • et al.
      Feasibility and accuracy of quantitative imaging on a 1.5 T MR-linear accelerator.
      ,
      • Belli G.
      • Busoni S.
      • Ciccarone A.
      • Coniglio A.
      • Esposito M.
      • Giannelli M.
      • et al.
      Quality assurance multicenter comparison of different MR scanners for quantitative diffusion-weighted imaging.
      ,
      • Palacios E.M.
      • Martin A.J.
      • Boss M.A.
      • Ezekiel F.
      • Chang Y.S.
      • Yuh E.L.
      • et al.
      Toward precision and reproducibility of diffusion tensor imaging: a multicenter diffusion phantom and traveling volunteer study.
      ,
      • Jerome N.P.
      • Papoutsaki M.-V.
      • Orton M.R.
      • Parkes H.G.
      • Winfield J.M.
      • Boss M.A.
      • et al.
      Development of a temperature-controlled phantom for magnetic resonance quality assurance of diffusion, dynamic, and relaxometry measurements.
      ,
      • Wang Y.
      • Tadimalla S.
      • Rai R.
      • Goodwin J.
      • Foster S.
      • Liney G.
      • et al.
      Quantitative MRI: defining repeatability, reproducibility and accuracy for prostate cancer imaging biomarker development.
      ,
      • Grech‐Sollars M.
      • Hales P.W.
      • Miyazaki K.
      • Raschke F.
      • Rodriguez D.
      • Wilson M.
      • et al.
      Multi-centre reproducibility of diffusion MRI parameters for clinical sequences in the brain.
      ,
      • Malyarenko D.
      • Galbán C.J.
      • Londy F.J.
      • Meyer C.R.
      • Johnson T.D.
      • Rehemtulla A.
      • et al.
      Multi-system repeatability and reproducibility of apparent diffusion coefficient measurement using an ice-water phantom.
      ,
      • Carr M.E.
      • Keenan K.E.
      • Rai R.
      • Boss M.A.
      • Metcalfe P.
      • Walker A.
      • et al.
      Conformance of a 3T radiotherapy MRI scanner to the QIBA diffusion profile.
      ]. The accuracy of the vials with higher polymer concentration was poorer, as was already observed in other studies using the NIST/QIBA diffusion phantom as well [
      • Kooreman E.S.
      • van Houdt P.J.
      • Nowee M.E.
      • van Pelt V.W.J.
      • Tijssen R.H.N.
      • Paulson E.S.
      • et al.
      Feasibility and accuracy of quantitative imaging on a 1.5 T MR-linear accelerator.
      ,
      • Palacios E.M.
      • Martin A.J.
      • Boss M.A.
      • Ezekiel F.
      • Chang Y.S.
      • Yuh E.L.
      • et al.
      Toward precision and reproducibility of diffusion tensor imaging: a multicenter diffusion phantom and traveling volunteer study.
      ,
      • Carr M.E.
      • Keenan K.E.
      • Rai R.
      • Boss M.A.
      • Metcalfe P.
      • Walker A.
      • et al.
      Conformance of a 3T radiotherapy MRI scanner to the QIBA diffusion profile.
      ]. This behaviour could be due to the lower diffusivity present in those vials. It was also worse for scanners E and F. The b0 images from these two scanners had the lowest SNR, possibly due to a higher bandwidth and a lower number of signal averages. Insufficient SNR may have influenced the assessment of the ADC in the highly concentrated vials. Images of scanner F also presented a N/2 ghosting artefact which could be caused by eddy currents. The presence of this artefact explains the substantial outlier for the outer circle 50 % vials. The ADC ranges of these concentrations are however below what is observed in human tissue (>0.5 x10-3 mm2/s) [
      • Jafar M.M.
      • Parsai A.
      • Miquel M.E.
      Diffusion-weighted magnetic resonance imaging in cancer: reported apparent diffusion coefficients, in-vitro and in-vivo reproducibility.
      ]. It is also mentioned in literature that ADC values can be heavily influenced by the off-set position of the ROI in the magnet because of the gradient nonlinearity present in each system [
      • Malyarenko D.I.
      • Newitt D.
      • J. Wilmes L.
      • Tudorica A.
      • Helmer K.G.
      • Arlinghaus L.R.
      • et al.
      Demonstration of nonlinearity bias in the measurement of the apparent diffusion coefficient in multicenter trials.
      ,
      • Fedeli L.
      • Belli G.
      • Ciccarone A.
      • Coniglio A.
      • Esposito M.
      • Giannelli M.
      • et al.
      Dependence of apparent diffusion coefficient measurement on diffusion gradient direction and spatial position – a quality assurance intercomparison study of forty-four scanners for quantitative diffusion-weighted imaging.
      ,
      • Fedeli L.
      • Benelli M.
      • Busoni S.
      • Belli G.
      • Ciccarone A.
      • Coniglio A.
      • et al.
      On the dependence of quantitative diffusion-weighted imaging on scanner system characteristics and acquisition parameters: a large multicenter and multiparametric phantom study with unsupervised clustering analysis.
      ]. However, this influence was not applicable for the ROI position in the phantom in this study.
      Radiomic features from the ADC maps were less robust than the mean ADC. Despite not moving the phantom from the table between scans, very few features were repeatible. By doing a long-term repeatability analysis with one week interval, the results were not getting a lot worse compared to the short-repeatability analysis for scanner B. Intensity and GLCM categories gave the best results, comparable to previous studies [
      • Carbonell G.
      • Kennedy P.
      • Bane O.
      • Kirmani A.
      • El Homsi M.
      • Stocker D.
      • et al.
      Precision of MRI radiomics features in the liver and hepatocellular carcinoma.
      ,
      • Yuan J.
      • Xue C.
      • Lo G.
      • Wong O.L.
      • Zhou Y.
      • Yu S.K.
      • et al.
      Quantitative assessment of acquisition imaging parameters on MRI radiomics features: a prospective anthropomorphic phantom study using a 3D–T2W-TSE sequence for MR-guided-radiotherapy.
      ]. More repeatable features were achieved when resampling to a smaller voxel size, as was also reported in a PET/CT phantom study [
      • Pfaehler E.
      • Beukinga R.J.
      • Jong J.R.
      • Slart R.H.J.A.
      • Slump C.H.
      • Dierckx R.A.J.O.
      • et al.
      Repeatability of 18 F- FDG PET radiomic features: A phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method.
      ]. To our knowledge, Dreher et al [
      • Dreher C.
      • Kuder T.A.
      • König F.
      • Mlynarska-Bujny A.
      • Tenconi C.
      • Paech D.
      • et al.
      Radiomics in diffusion data: a test–retest, inter- and intra-reader DWI phantom study.
      ] is the only group who also assessed radiomic feature repeatability in an ADC map on a phantom. They showed high repeatability in the features, but any comparison with our results would be inappropriate due to large differences in terms of study design (i.e., different phantom and repeatability indicator).
      By acquiring diffusion phantom data with six different scanners, we also showed that the ComBat procedure was able to realign radiomic feature distributions for ADC maps. The high variation in radiomic features between scanners was improved, but this was not sufficient for most of the features to pass below the pre-defined threshold (<10 %). These results indicate that ComBat cannot overcome variability in all features, more so if there is a big discrepancy between scanners as was shown in Fig. 6b. Assessing the success of the harmonisation should not be only based on statistical tests, but also on some variation indicator as wCV, and also by looking at the data as closely as possible. Even if ComBat does not need phantom experiments, a phantom study permits to control which features can be harmonised with ComBat, as was also performed in a previous study using phantom CT images [

      Ibrahim A, Refaee T, Leijenaar RTH, Primakov S, Hustinx R, Mottaghy FM, et al. The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLoS One 2021;16:e0251147. doi: 10.1371/journal.pone.0251147.

      ], and to select only the “ComBatable” ones for future clinical studies using that method. Other harmonisation solutions could be further explored, e.g. normalisation through deep learning, which is increasingly being studied [
      • Mali S.A.
      • Ibrahim A.
      • Woodruff H.C.
      • Andrearczyk V.
      • Müller H.
      • Primakov S.
      • et al.
      Making radiomics more reproducible across scanner and imaging protocol variations: a review of harmonization methods.
      ,
      • Zegers C.M.L.
      • Posch J.
      • Traverso A.
      • Eekers D.
      • Postma A.A.
      • Backes W.
      • et al.
      Current applications of deep-learning in neuro-oncological MRI.
      ].
      A limitation of this study is the use of the NIST/QIBA diffusion phantom for radiomic analysis, as it is not a heterogeneous phantom. As shown by Mackin et al. [
      • Mackin D.
      • Fave X.
      • Zhang L.
      • Fried D.
      • Yang J.
      • Taylor B.
      • et al.
      Measuring computed tomography scanner variability of radiomics features.
      ] for CT, different textural patterns are not affected in the same way by differences in scanners, and the design of the phantom has an impact on the reproducibility that is found between different scanners [
      • Li Y.
      • Reyhan M.
      • Zhang Y.
      • Wang X.
      • Zhou J.
      • Zhang Y.
      • et al.
      The impact of phantom design and material-dependence on repeatability and reproducibility of CT-based radiomics features.
      ]. Our results for texture features on a homogeneous phantom are therefore not directly transferable to patients, but are nevertheless a preliminary opportunity for investigating radiomic features in DWI. Efforts to create heterogeneous phantoms for the standardisation of radiomic features are ongoing [
      • Valladares A.
      • Beyer T.
      • Rausch I.
      Physical imaging phantoms for simulation of tumor heterogeneity in PET, CT, and MRI: an overview of existing designs.
      ]. It should also be noted that the within-subject coefficient of variation, which was used in this study to evaluate repeatability and reproducibility, lacks the assessment of variation between different ROIs. This means that features with low variation between scanners may also show low variation between patients, and would then not be useful in a radiomic model. The use of wCV in combination with another indicator that would quantify inter-scanner variability relative to inter-tumor variability (e.g. the intraclass correlation coefficient) would be preferable if there is some variation between ROIs texture corresponding to what might be observed in patients. Finally, different field strength systems were not included in this study as all scanners participating in the REGINA trial were at 3 T. Previous studies [
      • Chenevert T.L.
      • Galbán C.J.
      • Ivancevic M.K.
      • Rohrer S.E.
      • Londy F.J.
      • Kwee T.C.
      • et al.
      Diffusion coefficient measurement using a temperature-controlled fluid for quality control in multicenter studies.
      ,
      • Belli G.
      • Busoni S.
      • Ciccarone A.
      • Coniglio A.
      • Esposito M.
      • Giannelli M.
      • et al.
      Quality assurance multicenter comparison of different MR scanners for quantitative diffusion-weighted imaging.
      ,
      • Jerome N.P.
      • Papoutsaki M.-V.
      • Orton M.R.
      • Parkes H.G.
      • Winfield J.M.
      • Boss M.A.
      • et al.
      Development of a temperature-controlled phantom for magnetic resonance quality assurance of diffusion, dynamic, and relaxometry measurements.
      ,
      • Malyarenko D.
      • Galbán C.J.
      • Londy F.J.
      • Meyer C.R.
      • Johnson T.D.
      • Rehemtulla A.
      • et al.
      Multi-system repeatability and reproducibility of apparent diffusion coefficient measurement using an ice-water phantom.
      ] showed however invariance of the ADC map to field-strength, as long as b-values were not too high [
      • Belli G.
      • Busoni S.
      • Ciccarone A.
      • Coniglio A.
      • Esposito M.
      • Giannelli M.
      • et al.
      Quality assurance multicenter comparison of different MR scanners for quantitative diffusion-weighted imaging.
      ].

      Conclusion

      As part of the ongoing REGINA trial, accuracy, repeatability and reproducibility of mean ADC were assessed using the NIST/QIBA diffusion phantom across six MRI systems. This quality assurance was a success for all centres, and showed that mean ADC could be largely standardised to be used for prediction purposes in the trial. Radiomic features are however still fragile even in images with good standardisation and post-processing harmonisation.

      Declaration of Competing Interest

      The radiology and medical physics departments of Institut Jules Bordet has a research agreement with Siemens. The authors declare that they have no other known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgements

      We wish to thank the centres participating in this study and who welcomed us for the phantom measurements in their radiology department: AZ Groeninge (Kortrijk, Belgium), Centre Hospitalier Régional Sambre et Meuse (Namur, Belgium), Erasme (Bruxelles, Belgium), Grand Hôpital de Charleroi (Charleroi, Belgium).

      Fundings

      Zelda Paquier has a research grant from “L’Association Jules Bordet “. The REGINA trial is funded by Bayer.

      Appendix A. Supplementary data

      The following are the Supplementary data to this article:

      References

        • Drake-Pérez M.
        • Boto J.
        • Fitsiori A.
        • Lovblad K.
        • Vargas M.I.
        Clinical applications of diffusion weighted imaging in neuroradiology.
        Insights Imaging. 2018; 9: 535-547https://doi.org/10.1007/s13244-018-0624-3
        • Weinreb J.C.
        • Barentsz J.O.
        • Choyke P.L.
        • Cornud F.
        • Haider M.A.
        • Macura K.J.
        • et al.
        PI-RADS prostate imaging – reporting and data system: 2015, Version 2.
        Eur Urol. 2016; 69: 16-40https://doi.org/10.1016/j.eururo.2015.08.052
        • Fornasa F.
        Diffusion-weighted magnetic resonance imaging: what makes water run fast or slow?.
        J Clin Imaging Sci. 2011; 1: 27https://doi.org/10.4103/2156-7514.81294
        • Koh D.-M.
        • Collins D.J.
        Diffusion-weighted MRI in the body: applications and challenges in oncology.
        Am J Roentgenol. 2007; 188: 1622-1635https://doi.org/10.2214/AJR.06.1403
        • Padhani A.R.
        • Liu G.
        • Mu-Koh D.
        • Chenevert T.L.
        • Thoeny H.C.
        • Takahara T.
        • et al.
        Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations.
        Neoplasia. 2009; 11: 102-125https://doi.org/10.1593/neo.81328
        • Mytsyk Y.
        • Pasichnyk S.
        • Dutka I.
        • Dats I.
        • Vorobets D.
        • Skrzypczyk M.
        • et al.
        Systemic treatment of the metastatic renal cell carcinoma: usefulness of the apparent diffusion coefficient of diffusion-weighted MRI in prediction of early therapeutic response.
        Clin Exp Med. 2020; 20: 277-287https://doi.org/10.1007/s10238-020-00612-9
        • Galbán C.J.
        • Hoff B.A.
        • Chenevert T.L.
        • Ross B.D.
        Diffusion MRI in early cancer therapeutic response assessment.
        NMR Biomed. 2017; 30: e3458
        • Gillies R.J.
        • Kinahan P.E.
        • Hricak H.
        Radiomics: images are more than pictures, they are data.
        Radiology. 2016; 278: 563-577https://doi.org/10.1148/radiol.2015151169
        • Bickelhaupt S.
        • Jaeger P.F.
        • Laun F.B.
        • Lederer W.
        • Daniel H.
        • Kuder T.A.
        • et al.
        Radiomics based on adapted diffusion kurtosis imaging helps to clarify most mammographic findings suspicious for cancer.
        Radiology. 2018; 287: 761-770https://doi.org/10.1148/radiol.2017170273
        • Cusumano D.
        • Boldrini L.
        • Dhont J.
        • Fiorino C.
        • Green O.
        • Güngör G.
        • et al.
        Artificial Intelligence in magnetic Resonance guided Radiotherapy: medical and physical considerations on state of art and future perspectives.
        Phys Med. 2021; 85: 175-191https://doi.org/10.1016/j.ejmp.2021.05.010
        • Jafar M.M.
        • Parsai A.
        • Miquel M.E.
        Diffusion-weighted magnetic resonance imaging in cancer: reported apparent diffusion coefficients, in-vitro and in-vivo reproducibility.
        World J Radiol. 2016; 8: 21-49https://doi.org/10.4329/wjr.v8.i1.21
        • Yip S.S.F.
        • Aerts H.J.W.L.
        Applications and limitations of radiomics.
        Phys Med Biol. 2016; 61: R150-R166https://doi.org/10.1088/0031-9155/61/13/R150
        • O'Connor J.P.B.
        • Aboagye E.O.
        • Adams J.E.
        • Aerts H.J.W.L.
        • Barrington S.F.
        • Beer A.J.
        • et al.
        Imaging biomarker roadmap for cancer studies.
        Nat Rev Clin Oncol. 2017; 14: 169-186https://doi.org/10.1038/nrclinonc.2016.162
        • Shukla‐Dave A.
        • Obuchowski N.A.
        • Chenevert T.L.
        • Jambawalikar S.
        • Schwartz L.H.
        • Malyarenko D.
        • et al.
        Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials.
        J Magn Reson Imaging. 2019; 49: e101-e121https://doi.org/10.1002/jmri.26518
        • Orlhac F.
        • Boughdad S.
        • Philippe C.
        • Stalla-Bourdillon H.
        • Nioche C.
        • Champion L.
        • et al.
        A postreconstruction harmonization method for multicenter radiomic studies in PET.
        J Nucl Med. 2018; 59: 1321-1328https://doi.org/10.2967/jnumed.117.199935
        • Orlhac F.
        • Frouin F.
        • Nioche C.
        • Ayache N.
        • Buvat I.
        Validation of A method to compensate multicenter effects affecting CT radiomics.
        Radiology. 2019; 291: 53-59https://doi.org/10.1148/radiol.2019182023
        • Orlhac F.
        • Lecler A.
        • Savatovski J.
        • Goya-Outi J.
        • Nioche C.
        • Charbonneau F.
        • et al.
        How can we combat multicenter variability in MR radiomics? Validation of a correction procedure.
        Eur Radiol. 2021; 31: 2272-2280https://doi.org/10.1007/s00330-020-07284-9
        • Saint Martin M.-J.
        • Orlhac F.
        • Akl P.
        • Khalid F.
        • Nioche C.
        • Buvat I.
        • et al.
        A radiomics pipeline dedicated to Breast MRI: validation on a multi-scanner phantom study.
        Magn Reson Mater Physics, Biol Med. 2021; 34: 355-366https://doi.org/10.1007/s10334-020-00892-y
      1. Ibrahim A, Refaee T, Leijenaar RTH, Primakov S, Hustinx R, Mottaghy FM, et al. The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLoS One 2021;16:e0251147. doi: 10.1371/journal.pone.0251147.

        • Bregni G.
        • Vandeputte C.
        • Pretta A.
        • Senti C.
        • Trevisi E.
        • Acedo Reina E.
        • et al.
        Rationale and design of REGINA, a phase II trial of neoadjuvant regorafenib, nivolumab, and short-course radiotherapy in stage II and III rectal cancer.
        Acta Oncol (Madr). 2021; 60: 549-553https://doi.org/10.1080/0284186X.2020.1871067
      2. Quantitative Imaging Biomarkers Alliance. QIBA Profile : Diffusion-Weighted Magnetic Resonance Imaging (DWI) 2019.

        • Nyholm T.
        • Berglund M.
        • Brynolfsson P.
        • Jonsson J.
        EP-1533: ICE-Studio – an interactive visual research tool for image analysis.
        Radiother Oncol. 2015; 115: S837https://doi.org/10.1016/s0167-8140(15)41525-7
      3. Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative.

        • Fortin J.-P.
        • Parker D.
        • Tunç B.
        • Watanabe T.
        • Elliott M.A.
        • Ruparel K.
        • et al.
        Harmonization of multi-site diffusion tensor imaging data.
        Neuroimage. 2017; 161: 149-170https://doi.org/10.1016/j.neuroimage.2017.08.047
        • Fortin J.-P.
        • Cullen N.
        • Sheline Y.I.
        • Taylor W.D.
        • Aselcioglu I.
        • Cook P.A.
        • et al.
        Harmonization of cortical thickness measurements across scanners and sites.
        Neuroimage. 2018; 167: 104-120https://doi.org/10.1016/j.neuroimage.2017.11.024
        • Zhang J.
        • Qiu Q.
        • Duan J.
        • Gong G.
        • Jiang Q.
        • Sun G.
        • et al.
        Variability of radiomic features extracted from multi-b-value diffusion-weighted images in hepatocellular carcinoma.
        Transl Cancer Res. 2019; 8: 130-140https://doi.org/10.21037/tcr.2019.01.14
        • Prabhu V.
        • Gillingham N.
        • Babb J.S.
        • Mali R.D.
        • Rusinek H.
        • Bruno M.T.
        • et al.
        Repeatability, robustness, and reproducibility of texture features on 3 Tesla liver MRI.
        Clin Imaging. 2022; 83: 177-183https://doi.org/10.1016/j.clinimag.2022.01.002
        • Carbonell G.
        • Kennedy P.
        • Bane O.
        • Kirmani A.
        • El Homsi M.
        • Stocker D.
        • et al.
        Precision of MRI radiomics features in the liver and hepatocellular carcinoma.
        Eur Radiol. 2022; 32: 2030-2040https://doi.org/10.1007/s00330-021-08282-1
        • Mahmood U.
        • Apte A.
        • Kanan C.
        • Bates D.D.B.
        • Corrias G.
        • Manneli L.
        • et al.
        Quality control of radiomic features using 3D-printed CT phantoms.
        J Med Imaging. 2021; 8https://doi.org/10.1117/1.JMI.8.3.033505
        • Vallat R.
        Pingouin: statistics in Python.
        J Open Source Softw. 2018; 3: 1026https://doi.org/10.21105/joss.01026
        • Taouli B.
        • Beer A.J.
        • Chenevert T.
        • Collins D.
        • Lehman C.
        • Matos C.
        • et al.
        Diffusion-weighted imaging outside the brain: Consensus statement from an ISMRM-sponsored workshop.
        J Magn Reson Imaging. 2016; 44: 521-540https://doi.org/10.1002/jmri.25196
        • Chenevert T.L.
        • Galbán C.J.
        • Ivancevic M.K.
        • Rohrer S.E.
        • Londy F.J.
        • Kwee T.C.
        • et al.
        Diffusion coefficient measurement using a temperature-controlled fluid for quality control in multicenter studies.
        J Magn Reson Imaging. 2011; 34: 983-987https://doi.org/10.1002/jmri.22363
        • Roy S.
        • Whitehead T.D.
        • Quirk J.D.
        • Salter A.
        • Ademuyiwa F.O.
        • Li S.
        • et al.
        Optimal co-clinical radiomics: Sensitivity of radiomic features to tumour volume, image noise and resolution in co-clinical T1-weighted and T2-weighted magnetic resonance imaging.
        EBioMedicine. 2020; 59: 102963
        • Cattell R.
        • Chen S.
        • Huang C.
        Robustness of radiomic features in magnetic resonance imaging: review and a phantom study.
        Vis Comput Ind Biomed Art. 2019; : 2https://doi.org/10.1186/s42492-019-0025-6
        • Kooreman E.S.
        • van Houdt P.J.
        • Nowee M.E.
        • van Pelt V.W.J.
        • Tijssen R.H.N.
        • Paulson E.S.
        • et al.
        Feasibility and accuracy of quantitative imaging on a 1.5 T MR-linear accelerator.
        Radiother Oncol. 2019; 133: 156-162https://doi.org/10.1016/j.radonc.2019.01.011
        • Belli G.
        • Busoni S.
        • Ciccarone A.
        • Coniglio A.
        • Esposito M.
        • Giannelli M.
        • et al.
        Quality assurance multicenter comparison of different MR scanners for quantitative diffusion-weighted imaging.
        J Magn Reson Imaging. 2016; 43: 213-219https://doi.org/10.1002/jmri.24956
        • Palacios E.M.
        • Martin A.J.
        • Boss M.A.
        • Ezekiel F.
        • Chang Y.S.
        • Yuh E.L.
        • et al.
        Toward precision and reproducibility of diffusion tensor imaging: a multicenter diffusion phantom and traveling volunteer study.
        Am J Neuroradiol. 2017; 38: 537-545https://doi.org/10.3174/ajnr.A5025
        • Jerome N.P.
        • Papoutsaki M.-V.
        • Orton M.R.
        • Parkes H.G.
        • Winfield J.M.
        • Boss M.A.
        • et al.
        Development of a temperature-controlled phantom for magnetic resonance quality assurance of diffusion, dynamic, and relaxometry measurements.
        Med Phys. 2016; 43: 2998-3007https://doi.org/10.1118/1.4948997
        • Wang Y.
        • Tadimalla S.
        • Rai R.
        • Goodwin J.
        • Foster S.
        • Liney G.
        • et al.
        Quantitative MRI: defining repeatability, reproducibility and accuracy for prostate cancer imaging biomarker development.
        Magn Reson Imaging. 2021; 77: 169-179https://doi.org/10.1016/j.mri.2020.12.018
        • Grech‐Sollars M.
        • Hales P.W.
        • Miyazaki K.
        • Raschke F.
        • Rodriguez D.
        • Wilson M.
        • et al.
        Multi-centre reproducibility of diffusion MRI parameters for clinical sequences in the brain.
        NMR Biomed. 2015; 28: 468-485https://doi.org/10.1002/nbm.3269
        • Malyarenko D.
        • Galbán C.J.
        • Londy F.J.
        • Meyer C.R.
        • Johnson T.D.
        • Rehemtulla A.
        • et al.
        Multi-system repeatability and reproducibility of apparent diffusion coefficient measurement using an ice-water phantom.
        J Magn Reson Imaging. 2013; 37: 1238-1246https://doi.org/10.1002/jmri.23825
        • Carr M.E.
        • Keenan K.E.
        • Rai R.
        • Boss M.A.
        • Metcalfe P.
        • Walker A.
        • et al.
        Conformance of a 3T radiotherapy MRI scanner to the QIBA diffusion profile.
        Med Phys. 2022; 49: 4508-4517https://doi.org/10.1002/mp.15645
        • Malyarenko D.I.
        • Newitt D.
        • J. Wilmes L.
        • Tudorica A.
        • Helmer K.G.
        • Arlinghaus L.R.
        • et al.
        Demonstration of nonlinearity bias in the measurement of the apparent diffusion coefficient in multicenter trials.
        Magn Reson Med. 2016; 75: 1312-1323https://doi.org/10.1002/mrm.25754
        • Fedeli L.
        • Belli G.
        • Ciccarone A.
        • Coniglio A.
        • Esposito M.
        • Giannelli M.
        • et al.
        Dependence of apparent diffusion coefficient measurement on diffusion gradient direction and spatial position – a quality assurance intercomparison study of forty-four scanners for quantitative diffusion-weighted imaging.
        Phys Med. 2018; 55: 135-141https://doi.org/10.1016/j.ejmp.2018.09.007
        • Fedeli L.
        • Benelli M.
        • Busoni S.
        • Belli G.
        • Ciccarone A.
        • Coniglio A.
        • et al.
        On the dependence of quantitative diffusion-weighted imaging on scanner system characteristics and acquisition parameters: a large multicenter and multiparametric phantom study with unsupervised clustering analysis.
        Phys Med. 2021; 85: 98-106https://doi.org/10.1016/j.ejmp.2021.04.020
        • Yuan J.
        • Xue C.
        • Lo G.
        • Wong O.L.
        • Zhou Y.
        • Yu S.K.
        • et al.
        Quantitative assessment of acquisition imaging parameters on MRI radiomics features: a prospective anthropomorphic phantom study using a 3D–T2W-TSE sequence for MR-guided-radiotherapy.
        Quant Imaging Med Surg. 2021; 11: 1870-1887https://doi.org/10.21037/qims-20-865
        • Pfaehler E.
        • Beukinga R.J.
        • Jong J.R.
        • Slart R.H.J.A.
        • Slump C.H.
        • Dierckx R.A.J.O.
        • et al.
        Repeatability of 18 F- FDG PET radiomic features: A phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method.
        Med Phys. 2019; 46: 665-678https://doi.org/10.1002/mp.13322
        • Dreher C.
        • Kuder T.A.
        • König F.
        • Mlynarska-Bujny A.
        • Tenconi C.
        • Paech D.
        • et al.
        Radiomics in diffusion data: a test–retest, inter- and intra-reader DWI phantom study.
        Clin Radiol. 2020; 75: 798.e13-798.e22https://doi.org/10.1016/j.crad.2020.06.024
        • Mali S.A.
        • Ibrahim A.
        • Woodruff H.C.
        • Andrearczyk V.
        • Müller H.
        • Primakov S.
        • et al.
        Making radiomics more reproducible across scanner and imaging protocol variations: a review of harmonization methods.
        J Pers Med. 2021; 11: 842
        • Zegers C.M.L.
        • Posch J.
        • Traverso A.
        • Eekers D.
        • Postma A.A.
        • Backes W.
        • et al.
        Current applications of deep-learning in neuro-oncological MRI.
        Phys Med. 2021; 83: 161-173https://doi.org/10.1016/j.ejmp.2021.03.003
        • Mackin D.
        • Fave X.
        • Zhang L.
        • Fried D.
        • Yang J.
        • Taylor B.
        • et al.
        Measuring computed tomography scanner variability of radiomics features.
        Invest Radiol. 2015; 50: 757-765https://doi.org/10.1097/RLI.0000000000000180
        • Li Y.
        • Reyhan M.
        • Zhang Y.
        • Wang X.
        • Zhou J.
        • Zhang Y.
        • et al.
        The impact of phantom design and material-dependence on repeatability and reproducibility of CT-based radiomics features.
        Med Phys. 2022; 49: 1648-1659https://doi.org/10.1002/mp.15491
        • Valladares A.
        • Beyer T.
        • Rausch I.
        Physical imaging phantoms for simulation of tumor heterogeneity in PET, CT, and MRI: an overview of existing designs.
        Med Phys. 2020; 47: 2023-2037https://doi.org/10.1002/mp.14045