Advertisement
Research Article| Volume 107, 102538, March 2023

Deriving quantitative information from multiparametric MRI via Radiomics: Evaluation of the robustness and predictive value of radiomic features in the discrimination of low-grade versus high-grade gliomas with machine learning

Published:February 14, 2023DOI:https://doi.org/10.1016/j.ejmp.2023.102538

      Highlights

      • A reliable feature extraction pipeline for multiparametric MRI data was described.
      • Image preprocessing is a fundamental step in radiomic MRI analysis.
      • A set of robust features respect to the image preprocessing technique and features extraction settings was founded.
      • The robustness of radiomic features and the impact on their predictive power were investigated.

      Abstract

      Purpose

      Analysis pipelines based on the computation of radiomic features on medical images are widely used exploration tools across a large variety of image modalities. This study aims to define a robust processing pipeline based on Radiomics and Machine Learning (ML) to analyze multiparametric Magnetic Resonance Imaging (MRI) data to discriminate between high-grade (HGG) and low-grade (LGG) gliomas.

      Methods

      The dataset consists of 158 multiparametric MRI of patients with brain tumor publicly available on The Cancer Imaging Archive, preprocessed by the BraTS organization committee. Three different types of image intensity normalization algorithms were applied and 107 features were extracted for each tumor region, setting the intensity values according to different discretization levels. The predictive power of radiomic features in the LGG versus HGG categorization was evaluated by using random forest classifiers. The impact of the normalization techniques and of the different settings in the image discretization was studied in terms of the classification performances. A set of MRI-reliable features was defined selecting the features extracted according to the most appropriate normalization and discretization settings.

      Results

      The results show that using MRI-reliable features improves the performance in glioma grade classification (AUC=0.93±0.05) with respect to the use of raw (AUC=0.88±0.08) and robust features (AUC=0.83±0.08), defined as those not depending on image normalization and intensity discretization.

      Conclusions

      These results confirm that image normalization and intensity discretization strongly impact the performance of ML classifiers based on radiomic features. Thus, special attention should be provided in the image preprocessing step before typical radiomic and ML analysis are carried out.

      Keywords

      1. Introduction

      Analysis techniques based on Radiomics and machine learning (ML) offer nowadays a great potential in medical imaging research, since they have the capability to derive large amounts of quantitative features from images and to process them to produce clinically meaningful output. These approaches can be exploited to make predictions about the health status or outcome of single subjects in order to foster a precision medicine approach [
      • Papadimitroulas P.
      • Brocki L.
      • Chung N.
      • Marchadour W.
      • Vermet F.
      • Gaubert L.
      • et al.
      Artificial intelligence: Deep learning in oncological radiomics and challenges of interpretability and data harmonization.
      ,
      • Castiglioni I.
      • Rundo L.
      • Codari M.
      • Di Leo G.
      • Salvatore C.
      • Interlenghi M.
      • et al.
      AI applications to medical images: From machine learning to deep learning.
      ].
      A large part of radiomic applications concern the oncological field, mainly including lung, head and neck, prostate, brain and breast cancers. A variety of analysis pipelines have been delineated in order to process data acquired with the main imaging modalities (CT, MRI, PET, US) [
      • Avanzo M.
      • Wei L.
      • Stancanello J.
      • Vallières M.
      • Rao A.
      • Morin O.
      • et al.
      Machine and deep learning methods for radiomics.
      ,
      • Bos P.
      • Brekel M.
      • Taghavirazavizadeh M.
      • Gouw Z.
      • Al-Mamgani A.
      • Waktola S.
      • et al.
      Largest diameter delineations can substitute 3d tumor volume delineations for radiomics prediction of human papillomavirus status on mri’s of oropharyngeal cancer.
      ,
      • Yang Y.
      • Zheng B.
      • Li Y.
      • Li Y.
      • Ma X.
      Computer-aided diagnostic models to classify lymph node metastasis and lymphoma involvement in enlarged cervical lymph nodes using pet/ct.
      ,
      • Ieko Y.
      • Kadoya N.
      • Sugai Y.
      • Mouri S.
      • Umeda M.
      • Tanaka S.
      • et al.
      Assessment of a computed tomography-based radiomics approach for assessing lung function in lung cancer patients.
      ,
      • Tang Y.
      • Che X.
      • Wang W.
      • Su S.
      • Nie Y.
      • Yang C.
      Radiomics model based on features of axillary lymphatic nodes to predict axillary lymphatic node metastasis in breast cancer.
      ]. A large portion of brain cancer studies are focused on the classification of glioblastoma using Magnetic Resonance Imaging (MRI), which is a standard of care for brain tumor [
      • Zegers C.
      • Posch J.
      • Traverso A.
      • Eekers D.
      • Postma A.
      • Backes W.
      • et al.
      Current applications of deep-learning in neuro-oncological mri.
      ,
      • Vamvakas A.
      • Williams S.
      • Theodorou K.
      • Kapsalaki E.
      • Fountas K.
      • Kappas C.
      • et al.
      Imaging biomarker analysis of advanced multiparametric mri for glioma grading.
      ]. Gliomas are the most frequent primary brain tumors in adults. They are originated from glial cells and infiltrate the surrounding tissues. According to histopathologic and molecular Word Health Organization (WHO) criteria, they can be divided into grades 1/2 (Low Grade Gliomas -LGG-), and grades 3/4 (High Grade Gliomas -HGG-) [
      • Louis D.N.
      • Perry A.
      • Wesseling P.
      • Brat D.J.
      • Cree I.A.
      • FigarellaBranger D.
      • et al.
      The 2021 who classification of tumors of the central nervous system: a summary.
      ]. The grading of gliomas is a critical information related to patient prognosis and survival. The application of ML models for glioma grade prediction has considerably grown in recent years. A systematic review on ML models concerning HGG and LGG classification reports a prediction accuracy of the best performing model of 0.89±0.09 and an area under the ROC curve (AUC) of 0.92±0.07 [
      • Bahar R.C.
      • Merkaj S.
      • Cassinelli Petersen G.I.
      • Tillmanns N.
      • Subramanian H.
      • Brim W.R.
      • et al.
      Machine learning models for classifying high- and low-grade gliomas: A systematic review and quality of reporting analysis.
      ]. One of the main challenges for the clinical applicability of Radiomics is the repeatability and reproducibility of the radiomic features [
      • Lambin P.
      • Rios Velazquez E.
      • Leijenaar R.
      • Carvalho S.
      • Stiphout R.
      • Granton P.
      • et al.
      Radiomics: Extracting more information from medical images using advanced feature analysis.
      ,
      • Gillies R.
      • Kinahan P.
      • Hricak H.
      Radiomics: Images are more than pictures, they are data.
      ]. There is no consensus nowadays regarding the most repeatable and reproducible features [
      • Traverso A.
      • Wee L.
      • Dekker A.
      • Gillies R.
      Repeatability and reproducibility of radiomic features: A systematic review.
      ]. Whereas the repeatability refers to those features that do not change significantly when measured on the same subject imaged multiple times with the same instrumentation, protocol and feature extraction pipeline, the reproducibility is defined as the consistency of significant findings (e.g. a set of predictive features) across different subjects and studies on the same pathological condition. Another interesting property of radiomic features is the robustness, intended as the invariance of the feature values in a single acquisition of a same subject across different settings either in the acquisition parameters or in the radiomic feature extraction pipeline.
      There are several steps in a typical radiomic workflow where different choices of procedures and parameters can be made, thus affecting the robustness of the extracted features: during the image acquisition, in image pre- and post-processing, in the lesion segmentation and in the calculation of the radiomic features. An international collaboration, the Image Biomarker Standardization Initiative (IBSI),
      https://ibsi.readthedocs.io/.
      was established [
      • Mitchell-Hay R.
      • Ahearn T.
      • Murray A.
      • Waiter G.
      Investigation of the interand intrascanner reproducibility and repeatability of radiomics features in t1-weighted brain mri.
      ] with the purpose of standardizing the procedures according which radiomic features should be defined and extracted. However, feature calculation settings and software versions, which are fundamental aspects of the radiomic workflow [
      • Fornacon-Wood I.
      • Ackermann C.
      • Blackhall F.
      • Mcpartlin A.
      • Price G.
      • FaivreFinn C.
      • et al.
      Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform.
      ], are not included in the IBSI.
      Schwier et al. [
      • Schwier M.
      • van Griethuysen J.
      • Vangel M.
      • Pieper S.
      • Peled S.
      • Tempany C.
      • et al.
      Repeatability of multiparametric prostate mri radiomics features.
      ] were among the first to study the importance of both image preprocessing and feature extraction settings in radiomics. They considered a small dataset of multiparametric MRI of prostate tumors and evaluated the robustness of radiomic features considering different preprocessing techniques and extraction settings. In particular, they studied the impact of different image normalization algorithms, different image filtering and different bin widths for discretizing the image intensity values.
      Since the normalization [
      • Saltybaeva N.
      • Tanadini-Lang S.
      • Vuong D.
      • Burgermeister S.
      • Mayinger M.
      • Bink A.
      • et al.
      Robustness of radiomic features in magnetic resonance imaging for patients with glioblastoma: Multi-center study.
      ] and intensity discretization affect the robustness of radiomic features, in radiomic studies it is important to carefully report the steps and settings used to implement the radiomic workflow [
      • Hoebel K.
      • Patel J.
      • Beers A.
      • Chang K.
      • Singh P.
      • Brown J.
      • et al.
      Radiomics repeatability pitfalls in a scan-rescan mri study of glioblastoma.
      ].
      Currently, there is no agreement on which are the most efficient methods of image normalization and image intensity discretization that can be used to maximize the feature robustness. These issues are particularly relevant in MRI, where images acquired with typical clinical protocols are not quantitative. Moreover, the contrast between different tissues depends on the image acquisition parameters and on the acquisition hardware, thus directly affecting the computation of radiomic features [

      Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A, et al. Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In Medical Imaging 2018: Computer-Aided Diagnosis; vol. 10575. SPIE. https://doi.org/10.1117/12.2293992.

      ,
      • Um H.
      • Tixier F.
      • Bermudez D.
      • Deasy J.
      • Young R.
      • Veeraraghavan H.
      Impact of image preprocessing on the scanner dependence of multi-parametric mri radiomic features and covariate shift in multi-institutional glioblastoma datasets.
      ].
      We focused our study on the investigation of how different choices in the normalization and in the intensity discretization parameters influence the robustness of radiomic features and impact on their predictive power. We considered the specific case study of the glioma grade categorization by means of a radiomic approach combined with a ML classifier applied on multiparametric brain MRI data. The main objective of this work is to identify a suitable pipeline, which includes an appropriate image normalization step and discretization strategy preliminar to radiomic feature computation, in order to generate a reliable set of features specifically for MRI studies (MRI-reliable radiomic features). Moreover, an accurate study of the robustness of radiomic features with respect to different choices in image normalization and in image intensity discretization parameters is presented in order to highlight a set of robust features which can be safely used to analyze MRI data. A direct comparison between the predictive performance of the whole set of MRI-reliable features and of the restricted set of robust features is also provided in the study case of the glioma grade binary categorization with a ML classifier.

      2. Materials and methods

      Participants and data description

      The data used in this study included two datasets of multiparametric MRI scans of patients with brain tumor that were made publicly available on The Cancer Imaging Archive (TCIA)
      https://www.cancerimagingarchive.net/.
      [
      • Clark K.W.
      • Vendt B.A.
      • Smith K.E.
      • Freymann J.B.
      • Kirby J.S.
      • Koppel P.
      • et al.
      The cancer imaging archive (tcia): Maintaining and operating a public information repository.
      ] through the following data collections: The Cancer Genome Atlas Glioblastoma Multiforme (TCGA-GBM) [

      Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. Segmentation labels for the pre-operative scans of the tcga-gbm collection [data set]. 2017; https://doi.org/10.7937/K9/TCIA.2017.KLXWJJ1Q.

      ] and The Cancer Genome Atlas Low Grade Glioma (TCGA-LGG) [

      Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection [data set] 2017; https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF.

      ]. The Glioblastoma Multiforme (GBM) was considered a high grade glioma (HGG).
      The complete radiological data of the TCGA-GBM and TCGA-LGG collections consist of 262 and 199 multiparametric MRI scans provided by 8 and 5 different Institutions, respectively. Specific information related to patients acquired in the different centers and the scanner used can be retrieved in the work by Bakas et al. [
      • Bakas S.
      • Akbari H.
      • Sotiras A.
      • Bilello M.
      • Rozycki M.
      • Kirby J.S.
      • et al.
      Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features.
      ]. Each patient had preoperative images in four modalities (T1, T1-contrast enhanced, T2, FLAIR). All images were preprocessed using the FMRIB Software Library (FSL) by the BraTS organization committee. Each image was registered onto the same anatomical template, interpolated to the same resolution (1 × 1 × 1 mm3) and skull-stripped [
      • Bakas S.
      • Akbari H.
      • Sotiras A.
      • Bilello M.
      • Rozycki M.
      • Kirby J.S.
      • et al.
      Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features.
      ].
      The data included in this study was the subset of processed NIfTI images with segmentation labels of these collections. Specifically, they included 102 and 65 scans from the TCIA-GBM and TCIA-LGG collections, respectively. Moreover, in these datasets two different types of segmentations are available: the former is performed by GLISTRboost, which was awarded the first prize during the International Multimodal Brain Tumor Image Segmentation challenge 2015 (BraTS’15), the latter segmentation is a manual revision and correction of the first performed by expert clinicians. The output of segmentation delineates different masks, highlighting different parts of the tumor: the enhancing part of the tumor core (ET, label 4), the non-enhancing part of the tumor core (NET, label 1), and the peritumoral edema (ED, label 2) [
      • Menze B.
      • Jakab A.
      • Bauer S.
      • Kalpathy-Cramer J.
      • Farahaniy K.
      • Kirby J.
      • et al.
      The multimodal brain tumor image segmentation benchmark (brats).
      ], as shown for a representative image of the dataset in Fig. 1.
      Figure thumbnail gr1
      Fig. 1Examples of the three types of ROI masks used in our study. The images of a subject of the HGG group of the BRATS dataset are shown. The yellow mask (a) indicates the whole tumor, the green mask (b) indicates the edema (ED), the cyan one (c) indicates the enhancing part of the tumor core (ET) and the red one (c) indicates the non-enhancing part of the tumor core (NET). The edema structure is visible in T2-FLAIR image (a, b, d), whereas the non-enhancing structures are visible in T1-Gd image (c, e). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
      We decided to use the manually corrected masks to make sure that they were appropriate for an accurate classification. Thus, considering only subjects for which the manuallyrevised masks were available, we obtained a final sample of 97 subjects from the TCIA-GBM collection (HGG label) and 61 from the TCIA-LGG collection (LGG label).
      For each subject we have analyzed the four available modalities. The list of the IDs of the subjects analyzed in this study is provided in the Supplementary Materials.
      In order to extract multi-regional radiomic features we combined the ED, NET and ET masks to obtain the entire tumor volume.

      Radiomic and ML analysis workflow

      A typical radiomic and ML analysis workflow, based on a radiomic feature extraction step followed by a machine learning classification, has been followed in this study, as depicted in Fig. 2. In particular, we focused on the image normalization and features extraction steps that are detailed in the paragraphs below.
      Figure thumbnail gr2
      Fig. 2A typical radiomic and ML workflow is shown, with specific reference to the choices made in our analysis: starting from image acquisition (MRI in our work), tumor masks should be provided either by automated/semi-automated algorithms or by experts; then, image normalization strategies have to be implemented (e.g. linear intensity scaling, normalization to the intensity of an unaffected anatomical region); the extraction of radiomic features follows, paying attention to the choice of any free parameters of the radiomic software packages; a ML classifier can be used to analyze the radiomic features, provided specific labels indicating different diagnostic categories are available; finally, the ML classification performance should be evaluated according to a standard train-test splitting criterion, e.g. the k-fold cross validation scheme.
      In the rest of this study, except when explicitly indicated, we evaluated the tumor grading considering only the whole tumor ROI and the dataset composed by the radiomic features computed for all MRI sequences available (combination of all modalities).

      Image intensity normalization

      Most MRI protocols adopted in clinical practice include a series of acquisition sequences that generate several clinically relevant images, thus providing the multiparametric nature to this technique. MRI images typically are contrast-based non-quantitative images. They show useful contrasts between tissues (e.g. T1-weighted, T2-weighted, FLAIR, contrastenhanced imaging), however, the voxel intensities are on an arbitrary scale. This characteristic certainly affects the values of some radiomic features and consequentially their robustness, and thus it should be taken into account in radiomic studies [
      • Saltybaeva N.
      • Tanadini-Lang S.
      • Vuong D.
      • Burgermeister S.
      • Mayinger M.
      • Bink A.
      • et al.
      Robustness of radiomic features in magnetic resonance imaging for patients with glioblastoma: Multi-center study.
      ].
      To make comparable the gray value distributions of images acquired with the same MRI sequence across different subjects, an intensity normalization procedure can be adopted. In our study, we implemented the following three different types of normalization algorithms on the images, i.e. before the radiomic features are computed:
      • 1.
        Linear normalization of voxel intensity values between the the maximum and minimum gray value for that image (Norm_MinMax): the voxel intensity value is transformed by subtracting the minimum value and then dividing by the difference between the maximum and the minimum of the intensity values of the whole brain as:
      xiMinMax=xi-min(x¯)maxx¯-min(x¯)


      where xi is the intensity at a single voxel location, whereas x¯ indicates the 3D image array masked with the segmented brain. The normalized image intensity values are in the [0, 1] range.
      • 2.
        Linear normalization of voxel intensity values by subtracting the median and scaling the data according to the quantile range. This procedure is referred to as RobustScaler (Norm_RobustScaler). In this case the voxel intensity values are transformed by subtracting them the median value and then scaling them by the inter-quartile range (IQR), which is defined as the difference between the the 3rd quartile Q3 (75th percentile) and the 1st quartile Q1 (25th percentile) of the intensity values of the whole brain. The Norm_RobustScaler can be formulated as:
      xiRobustScaler=xi-median(x¯)Q3-Q1


      • 3.
        Normalization to the intensity of the brainstem region using RobustScaler (Norm_Brainstem). For this normalization we selected a ROI in the brainstem and adapted the RobustScaler formula as follows. The intensity value of each voxel is transformed by subtracting the median value and then dividing by the IQR of the intensity values of the brainstem, as:
      xiBrainstem=xi-median(x¯Brainstem)Q3-Q1


      We expect the intensity of the brainstem region to be more homogeneous, which we believe makes this normalization more robust.
      Fig. 3 shows the process of coregistration and normalization. In order to segment the brainstem, firstly the multiparametric MR images of each patient and Atlas were coregistered to the MNI space using ANTsPy. In particular, the SyNRA transformation, as defined in ANTsPy, was applied, consisting in a sequence of a rigid, an affine and a deformable transformation, with mutual information as optimization metric. After co-registration, all images were intensity-normalized according to the Norm_Brainstem procedure described above. This procedure is based on the fact that image intensity is normalized to an unaffected portion of the brain. As a control measure, we implemented a check to avoid that the brainstem mask intersects the tumor mask.
      Figure thumbnail gr3
      Fig. 3The process of coregistration and normalization are shown. MR images (a) and atlas (b) were coregistered to the MNI space. After co-registration, the brainstem ROI was identified (c) and all images were normalized to the intensity values of the brainstem (d).

      Radiomic feature extraction

      The computation of the radiomic features on the multiparametric MRI images was performed with the open source Python package PyRadiomics (v3.0.1) [
      • Van Griethuysen J.J.
      • Fedorov A.
      • Parmar C.
      • Hosny A.
      • Aucoin N.
      • Narayan V.
      • et al.
      Computational radiomics system to decode the radiographic phenotype.
      ]. This platform was validated by the developers against the IBSI benchmark values [
      • Zwanenburg A.
      • Valliè res M.
      • Abdalah M.A.
      • Aerts H.J.W.L.
      • Andrearczyk V.
      • Apte A.
      • et al.
      The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping.
      ]. Radiomic features were computed for all MRI sequences available for each patient considering the manuallyrevised masks of segmented lesions. In particular, we considered the NET, the ED and the total volume masks. For each region we extracted 107 features consisting of: 18 histogram based features (also called First Order Statistics or intensity features) computed on pixel gray-level histograms; 14 shape-based features, which depend only on the shape of mask; 75 texture-based features, derived from gray-level co-occurrence matrix (GLCM), gray-level size zone matrix (GLSZM), gray-level dependence matrix (GLDM), gray-level run length matrix (GLRLM), neighboring gray tone difference matrix (NGTDM).
      Furthermore, regarding the study of the robustness of radiomic features we decided to exclude the shape-based features, since different normalization and discretization methods do not influence their values. In fact, these features are an indicator of ROI morphology and they are independent from the gray level values, as they are derived directly from the ROI binary mask [].

      Voxel intensity discretization

      The computation of the texture-based and some of the intensity features, as defined in the PyRadiomics package, requires binning the intensity histogram. This discretization step can be performed using either a defined bin width (absolute binning) or a preset number of bins (relative binning), adapted to the range of intensity values in the masked image [
      • Schwier M.
      • van Griethuysen J.
      • Vangel M.
      • Pieper S.
      • Peled S.
      • Tempany C.
      • et al.
      Repeatability of multiparametric prostate mri radiomics features.
      ]. Assuming the user has normalized the intensities, using the default intensity binning settings as implemented in PyRadiomics (constant bin width set to 25 intensity units) can result in meaningless binning for the computation of the texture features, i.e GLCM. To avoid this problem, an explicit choice of the number of bins has to be done [
      • Hoebel K.
      • Patel J.
      • Beers A.
      • Chang K.
      • Singh P.
      • Brown J.
      • et al.
      Radiomics repeatability pitfalls in a scan-rescan mri study of glioblastoma.
      ].
      We evaluated the influence on the robustness of radiomic features of choosing different total number of bins, which is the parameter that determines the dynamic range of the discretized gray values of the images, as recommended by the Image Biomarker Standardisation Initiative (IBSI) when dealing with non-quantitative data [
      • Zwanenburg A.
      • Valliè res M.
      • Abdalah M.A.
      • Aerts H.J.W.L.
      • Andrearczyk V.
      • Apte A.
      • et al.
      The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping.
      ]. We computed the radiomic features for the whole dataset by setting the number of intensity discretization levels (bin counts as indicated within PyRadiomics) to 8, 16, 128 and 512.

      Evaluation of the predictive power of radiomic features

      We evaluated the predictive power of radiomic features in the categorization between LGG and HGG, by using random forest (RF) classifiers [

      Breiman L. Random forests. 1999.

      ]. The binary classification performances have been evaluated across the various image normalization methods and the different settings in the image discretization procedure.
      The RF training process consists in training a number of decision trees on randomly selected data samples, getting a prediction from each tree, and then selecting the best solution by means of voting [

      Breiman L. Random forests. 1999.

      ]. The metric we used to evaluate the classification performance was the area under the ROC curve (AUC), which can be interpreted as the probability that a classifier ranks a randomly chosen positive example higher than a randomly chosen negative example, with the assumption that the positive example ranks higher than the negative example [
      • Metz C.E.
      Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems.
      ]. We used the RandomForestClassifier in Scikit-learn [
      • Pedregosa F.
      • Varoquaux G.
      • Gramfort A.
      • Michel V.
      • Thirion B.
      • Grisel O.
      • et al.
      Scikit-learn: Machine learning in Python.
      ], an opensource machine learning Python library. We set the number of trees to the default value of 500 and the number of candidate predictors considered in each split to nP, where nP is the number of predictors.
      The RF model has been trained according to a stratified 5-fold cross-validation scheme. Results across the 5 test folds were collected to calculate the average AUC and its standard deviation.

      Robustness of radiomic features

      The open-source Python package Pingouin [
      • Vallat R.
      Pingouin: statistics in python.
      ] was used for statistical analysis. The Intraclass Correlation Coefficient (ICC) was considered as a measure of the robustness of the radiomic features. In particular, we selected the two-way mixed effects model, with average raters and absolute agreement. Accordingly, the ICC was calculated as follows [
      • Liljequist D.
      • Elfving B.
      • Skavberg Roaldsen K.
      Intraclass correlation – a discussion and demonstration of basic features.
      ]:
      ICC=MSR-MSEMSR+(MSC-MSE)/n


      where MSR = mean square for rows (raters, i.e. in our analysis, features computed using different normalization methods or discretization parameters); MSC = mean square for columns (subjects); MSE = mean square for error (variability due to differences in the evaluations of the subjects by the raters); n = number of subjects. The ICC values range between 0 and 1, with values closer to 1 representing higher robustness [
      • Koo T.K.
      • Li M.Y.
      A guideline of selecting and reporting intraclass correlation coefficients for reliability research.
      ]. The radiomic features were stratified based on their degree of robustness: poor (ICC  0.5), moderate (0.5 < ICC  0.75), good (0.75 < ICC  0.9), and excellent (ICC > 0.9) robustness.
      We studied the effect of normalization (Intra-Normalization ICC) and the effect of image intensity discretization (Intra-Discretization ICC) on the robustness of features.

      10. Implementation and results

      Following the analysis pipeline depicted in Fig. 2, we first evaluated the effect of the different normalization procedures on the grayscale histograms of the images. Then, we studied the impact of the different intensity normalization strategies and the different choices of voxel intensity discretization on the LGG vs HGG discrimination performances obtained by the RF classifier. For this step we separately analyzed the performances obtained by using the intensity-based and texture-based radiomic features. Afterward, we evaluated the predictive power of the radiomic features computed with the best image normalization and discretization choices highlighted by the above-mentioned procedures. Then we estimated the robustness of the radiomic features in terms of ICC across the different image intensity normalization protocols and different choices of intensity discretization, in order to define a set of robust features. Finally, we made a comparison between the performance achieved by using all radiomic features extracted with or without image normalization and the set of robust features.

      Evaluation of the impact of image intensity normalization

      The optimal intensity normalization technique is expected to result in a better overlap among the intensity histograms of normalized images with respect to the original ones. In Fig. 4, a comparison among the histograms of the intensity values of original and normalized.
      Figure thumbnail gr4
      Fig. 4Histogram of the grey level intensity values of the T1-weighted images of some representative subjects: (a) original image; (b) image normalized to the maximum and minimum of the intensity values of the whole brain (Norm_MinMax); (c) image normalized respect to the median and inter-quartile range of the intensity values of the whole brain (Norm_RobustScaler); (d) image normalized respect to the median and inter-quartile range of the intensity values of the brainstem (Norm_Brainstem).
      T1 images of some representative subjects is shown. For the population considered in this study, all the normalization techniques improved the similarity among the histograms. However, the image histograms appear as best aligned when the Norm_Robust and the Norm_Brainstem normalization procedures are implemented. Moreover, Fig. 5 shows a slice from some representative subjects. They are all displayed at a fixed gray level windows. For the original images the use of the same window was not appropriate for all subjects. Instead, after the normalization (Norm_Robust), both brightness and contrast are more similar for each image, making the images more comparable.
      Figure thumbnail gr5
      Fig. 5A comparison among the original and normalized T1-weighted images for some representative subjects displayed at a fixed gray level windows. After normalization the images were more comparable to each other.
      The impact of the normalization techniques adopted has been studied in terms of the classification performances. The results obtained applying the binary classification approach described in Sec.2.3, considering the intensity and the texture features separately, are reported in Fig. 6. As shown in Fig. 6(b), the performances obtained considering only the texture features are not influenced by the normalization strategy implemented. On the contrary, the performances achieved using only the intensity features depend on the normalization technique applied, as visible in Fig. 6(a). The trend of the AUC values for different choices of normalization procedures, suggests that both Norm_RobustScaler and Norm_Brainstem are good choices for the extraction of informative intensity-based radiomic features. Moreover, the Norm_Brainstem normalization reduces the fluctuations (AUC = 0.92 ± 0.04) with respect to the Norm_RobustScaler method (AUC = 0.92 ± 0.06).
      Figure thumbnail gr6
      Fig. 6Performances achieved in LGG vs HGG discrimination using different numbers of gray levels for intensity discretization and different normalization methods on intensity (a) and on texture (b) features.

      Evaluation of the impact of the number of intensity discretization levels

      The effect of the choice of the number of grey level discretization values (8, 16, 128 and 512) on the classification performances are summarized in Fig. 6. The performances achieved using only the intensity features (Fig. 6(a)), for any normalization strategy adopted, are almost stable across the different choices of discretization levels. On the other hand, as can be inferred from Fig. 6(b), a variation in the number of intensity discretization levels, for the same normalization, leads to different performances when texture features are analyzed.
      The trend of the AUC values for the different choices in the number of intensity discretization levels, suggests that 16 and 128 levels are good choices for the extraction of informative texture-based radiomic features.

      Evaluation of the predictive power of radiomic features across different modalities andtumor masks

      We defined a set of MRI-reliable radiomic features, which is composed by intensity and texture features extracted according to the best normalization (Norm_Brainstem) and discretization (128 bin) settings.
      We evaluated the classification performance for each tumor zone and MRI modality by considering intensity, texture and the whole set of MRI-reliable radiomic features. The results obtained applying the RF binary classification approach described in Sec. 2.3 is shown in Fig. 7. It can be notice that the performances vary across different modalities and tumor ROI masks. However, considering the whole tumor ROI and the combinations of all modalities, the performances are maximized, in particular the highest AUC is obtained with the MRI-reliable feature set (Fig. 7(a)). The intensity features (AUC = 0.92 ± 0.04) are more efficient to capture the variation of different glioma grades (low and high), and the texture features follow with similar performance (AUC = 0.90 ± 0.07), as shown in Fig. 7(b) and (c).
      Figure thumbnail gr7
      Fig. 7Heatmap reporting the performances in terms of AUC obtained in the LGG vs HGG classification using features extracted according to the best normalization method (Norm_Brainstem) and gray discretization level (128 bin). The AUC is reported, taking into account the different tumor region and their combinations (NET, ED, ED + NET and Whole Tumor) and the different MRI modalities (T1, T1-Gd, T2, FLAIR) both separately and in combination. The four heatmap have been generated by considering MRI-reliable features (a), only texture features (b) and only intensity features (c).

      Robustness of radiomic features

      The ICC of radiomic features was calculated for the intensity and texture features, across the different choices in intensity discretization and normalization pipelines. In particular, we computed the Intra-Normalization ICC for features extracted with 128 discretization levels and the Intra-Discretization ICC on the features extracted from Norm_Brainstem images.
      Table 1 reports the proportion of intensity and texture features that are characterized by an excellent robustness as the discretization choice or the normalization strategy vary. Firstly, when varying the normalization pipeline, we found out that the subset of the most robust features, is composed by 16 intensity features. Secondly, when varying the image intensity discretization strategy, the subset of the most robust features, is composed by 43 texture features. The lists of the most robust features is provided in the Supplementary Materials. Almost all intensity features show an excellent robustness when varying the number of discretization levels, as expected. Beside, all texture features show an excellent robustness when varying the normalization method.
      Table 1Amount of the most robust features over the total number of features, with respect to different choices in intensity discretization and normalization pipelines. The features, which are grouped by type (intensity and texture), were extracted by considering the whole tumor ROI and the combination of all modalities.
      Type of featuresMost robust features in the Intra-Discretization analysisMost robust features in the Intra-Normalization analysis
      Intensity features68/7216/72
      Texture features43/300300/300
      The percentage of features, stratified according to their degreee of robustenss and grouped by MRI modality and/or type of feature, obtained from the computation of the ICC is shown in Fig. 8. Regarding the Intra-Discretization ICC, an excellent robustness can be observed for GLCM and NGTDM features (Fig. 8, panel a). In addition, features result more stable for the T1Gd modality, which may depend on the fact that the contrast agent provide relevant and dominant information for texture features (Fig. 8, panel b). As concerns the Intra-Normalization ICC an excellent robustness emerged for entropy, kurtosis, skewness and uniformity in all image modalities (Fig. 8, panel c). This can depend on the fact that these features are connected to the shape of the histogram that represents the gray level distribution, whereas they are not strictly dependent on the gray level values which are influenced by normalization.
      Figure thumbnail gr8
      Fig. 8Proportion of level of robustness of radiomic features obtained using the ICC: Intra-Discretization robustness of texture features extracted from Norm_Brainstem normalized images grouped by texture features class (a) and MRI modality (b). Notice that an excellent robustness can be observed for NGTDM and GLCM features and that the features result more stable for the T1Gd modality. (c) Intra-Normalization robustness of intensity features extracted with 128 discretized gray levels. An excellent robustness emerged for entropy, kurtosis, skewness and uniformity features.

      Predictive power of robust features only versus all

      A comparison between the LGG versus HGG glioma discrimination performances obtained using the set of raw features, extracted without normalization, the whole set of MRIreliable features and the restricted set of robust features was performed. Table 2 summarizes the classification performances expressed in terms of AUC. Considering the combination of all modalities we achieved the best performance by using the whole set of MRI-reliable features. The predictive power of the model decreases by using, as input, the set of raw features and a further performance reduction is observed by taking into account only the restricted set of robust features.
      Table 2Comparison between the performances, reported in terms of percent AUC, obtained in the LGG vs HGG classification using the raw features, the whole set of MRI-reliable features, extracted according to the best normalization method (Norm_Brainstem) and gray discretization level (128 bin), and a restricted set of robust features which have obtained an excellent grade of robustness based on Intra-Normalization and Intra-Discretization ICC.
      ModalityAll raw featuresRobust featuresAll MRI-reliable features
      T10.73 ± 0.050.67 ± 0.070.69 ± 0.04
      T1-Gd0.89 ± 0.050.78 ± 0.070.93 ± 0.05
      T20.76 ± 0.080.74 ± 0.090.75 ± 0.06
      FLAIR0.76 ± 0.080.63 ± 0.080.76 ± 0.06
      All modalities0.88 ± 0.080.83 ± 0.080.93 ± 0.05

      16. Discussion

      Radiomics is a quantitative analytical method to extract information from medical images, and machine learning is typically used to correlate radiomic features and patientspecific data related to outcomes [
      • Gillies R.
      • Kinahan P.
      • Hricak H.
      Radiomics: Images are more than pictures, they are data.
      ]. For the application of Radiomics in MRI, which typically generates non-quantitative contrast-based images, it is important to adopt image normalization strategies and appropriate choices of image intensity discretization parameters. Previous studies have investigated the effect of image normalization in Radiomics, finding that appropriate normalization helps to improve radiomic feature reliability and to increase ML predictive power [
      • Saltybaeva N.
      • Tanadini-Lang S.
      • Vuong D.
      • Burgermeister S.
      • Mayinger M.
      • Bink A.
      • et al.
      Robustness of radiomic features in magnetic resonance imaging for patients with glioblastoma: Multi-center study.
      ,

      Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A, et al. Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In Medical Imaging 2018: Computer-Aided Diagnosis; vol. 10575. SPIE. https://doi.org/10.1117/12.2293992.

      ,
      • Hoebel K.
      • Patel J.
      • Beers A.
      • Chang K.
      • Singh P.
      • Brown J.
      • et al.
      Radiomics repeatability pitfalls in a scan-rescan mri study of glioblastoma.
      ,
      • Fatania K.
      • Mohamud F.
      • Clark A.
      • Nix M.
      • Short S.
      • O’Connor J.
      • et al.
      Intensity standardization of mri prior to radiomic feature extraction for artificial intelligence research in glioma—a systematic review.
      ,
      • Isaksson L.J.
      • Raimondi S.
      • Botta F.
      • Pepa M.
      • Gugliandolo S.G.
      • De Angelis S.P.
      • et al.
      Effects of mri image normalization techniques in prostate cancer radiomics.
      ]. Despite this, there is a lack of consensus regarding the optimal method to adopt for enhancing feature reproducibility [
      • Lambin P.
      • Rios Velazquez E.
      • Leijenaar R.
      • Carvalho S.
      • Stiphout R.
      • Granton P.
      • et al.
      Radiomics: Extracting more information from medical images using advanced feature analysis.
      ]. Moreover, some studies found that intensity discretization have effects on the repeatability of radiomic feature extraction from clinical MRI data [
      • Molina-García D.
      • Pérez Beteta J.
      • Martínez-González A.
      • Martino J.
      • Velásquez C.
      • Arana E.
      • et al.
      Influence of gray level and space discretization on brain tumor heterogeneity measures obtained from magnetic resonance images.
      ,
      • Duron L.
      • Balvay D.
      • Vande Perre S.
      • Bouchouicha A.
      • Savatovsky J.
      • Sadik J.C.
      • et al.
      Gray-level discretization impacts reproducible mri radiomics texture features.
      ,
      • Shiri I.
      • Hajianfar G.
      • Sohrabi A.
      • Abdollahi H.
      • Shayesteh S.
      • Geramifar P.
      • et al.
      Repeatability of radiomic features in magnetic resonance imaging of glioblastoma: Test-retest and image registration analyses.
      ].
      In this paper we showed how the image preprocessing techniques (intensity normalization) and the parameters of features extraction settings (grey level discretization) affect the predictive power in glioma grade classification and the robustness of radiomic features. Our results suggest that, without normalization, texture features (AUC = 0.90 ± 0.06) are more predictive than intensity features (AUC = 0.84 ± 0.05). This finding is consistent with Tian et al. [
      • Tian Q.
      • Yan L.F.
      • Zhang X.
      • Zhang X.
      • Hu Y.C.
      • Han Y.
      • et al.
      Radiomics strategy for glioma grading using texture features from multiparametric mri: Radiomics approach for glioma grading.
      ], who found out that texture features were more effective in glioma grade prediction than histogram statistics. However, we highlighted that the intensity features, if properly normalized, menage to reach even better performances (AUC = 0.92 ± 0.04). In particular, we found that the Norm_Robust and Norm_Brainstem normalization approaches improve the classification performances. This result may be due to the fact that the normalizations based on median and inter-quartile range are not sensitive to the presence of outlier values in image intensity distributions, thus making the intensity distributions of MRI data more comparable avoiding confounding effects. Image discretization is another important step before radiomic features estimation. We found out that using different intensity discretization levels (with a fixed bin count approach) resulted in different classification performances. In particular, we obtained the best performances for the middle values of the number of intensity discretization levels (16 and 128 bins). This can be attributed to the fact that for the extreme values (8 and 512 bins) the image is discretized in too many or not enough intensity values and the texture features become less efficient to capture the difference between low grade and high grade gliomas. These findings are in agreement with Ortuz-Ramon et al. [
      • Ortiz-Ramón R.
      • Ruiz-España S.
      • Mollá-Olmos E.
      • Moratal D.
      Glioblastomas and brain metastases differentiation following an mri texture analysis-based radiomics approach.
      ] who proposed a 2D texture analysis combined with machine learning techniques on T1-weighted images for differentiating between brain metastasis and glioblastomas. In particular, they found that the optimal intensity discretization level was 128 bins. Moreover, we found out that texture features are not influenced by the implemented normalization strategy, and this can be explained by considering that they are not directly based on the intensity value of each voxel in the ROI, but they are derived from inter-relationships between two or more voxels of the ROI. On the other hand, intensity features are not influenced by the intensity discretization settings. This is expected because the variation in the number of intensity discretization levels does not affect the shape of the distribution of voxel intensity values, on which the First Order Statistic features are based.
      Generally speaking, the evaluation of the robustness of radiomic features and the selection of a subset of reliable features is an important step in radiomic analysis, both from the point of view of the feature reduction, useful for machine learning algorithms, and to obtain a radiomic signature that can be used as an MRI based biomarker of a disease condition. In this work, the robustness of the radiomic features was investigated, from the statistical point of view, by calculating the ICC when varying the normalization strategy (Intra-Normalitazion ICC) or the intensity discretization levels (Intra-Discretization ICC). In this way we identified a subset of the most robust features, corresponding to the 16 % of all features.
      Finally, we performed a direct comparison between the classification performances obtained using different sets of features. Considering the subset of the most robust features, the results achieved are quite high (AUC = 0.83 ± 0.08). However, they are outperformed by those achieved using the set of all MRI-reliable features (AUC = 0.93 ± 0.05). The lower performances reached with the subset of robust features are due to the fact that robustness and predictive power are independent concepts. In fact, the ICC analysis used to determine the robust features is based on a statistical index and does not consider the label of subjects. It is a sort of unsupervised feature selection algorithm. In addition, the higher performances obtained by using the MRI-reliable set of features with respect to to the results achieved considering the set of raw features (AUC = 0.88 ± 0.08) is due to the fact that normalization plays an important role on the predictive power reducing the variability due to the non-quantitative nature of MRI.
      The main limitation of our study is connected to the issues that arise when working with small datasets. In this work we considered two multi-center datasets and, due to their relatively small sizes, it was not possible to carry out a stratified study in order to consider each centre individually. Moreover, the datasets considered were originally collected for a segmentation challenge and there is a class imbalance between HGG and LGG cases, which further prevents the possibility to stratify according to additional parameters (e.g. age, sex). Another possible limitation to the application of our pipeline is related to the possible presence of image artifacts that affect the radiomic features. However, to handle properly these situations two different strategies can be implemented: 1) to prepend to the ML algorithm execution a specific module for outlier detection, which identifies the cases that cannot be processed by the ML algorithm as they were not considered in the ML training phase; 2) once a “special case” is identified, original images are cured by a preprocessing pipeline devoted either to exclude from the analysis the area of the image altered by the presence of artifacts; then, the standard analysis pipeline is executed on the curated images. Finally, the influence of acquisition protocol (i.e. scanner type or acquisition settings) on image properties needs specific investigation to further understand the behavior of radiomic features. Likewise, the preprocessing techniques implemented in this study are representative of a few options available and are not comprehensive of all methods.

      17. Conclusion

      This study described a reliable feature extraction pipeline for multiparametric MRI data and identified a subset of robust features with respect to different intensity normalization techniques and different settings in grey level discretization. This subset of robust features allowed to obtain stable classification performance in the low-grade versus high-grade glioma discrimination (AUC = 0.83 ± 0.08), regardless of the settings chosen for image normalization and discretization of image intensity levels. Moreover, the performance in glioma grade discrimination is enhanced when the set of features defined as MRI-reliable is used (AUC = 0.93 ± 0.05). The preparation of the MRI-reliable set of features requires the implementation of appropriate intensity normalization strategies, due to the non-quantitative nature of the MRI acquisition sequences typically implemented in clinical exams. The normalization of the image intensity to that of an unaffected region of the image is recommended, despite it can be demanding from the implementation point of view. A valid easy-to-implement alternative is the image intensity standardization to the median and inter-quartile range of the grey-level intensity values of the whole brain.
      Our results highlight that image normalization is a fundamental step in MRI analysis via Radiomics and Machine Learning. Due to the strong impact it has on the performance of a ML classifier, special attention should be provided in the image preprocessing step before typical radiomic analysis are performed.

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgments

      This work has been carried out within the Artificial Intelligence in Medicine (next_AIM, https://www.pi.infn.it/aim) project funded by INFN (CSN5) and within the FAIR-AIM project funded by Tuscany Government (POR FSE 2014-2020).

      Appendix A. Supplementary data

      The following are the Supplementary data to this article:

      References

        • Papadimitroulas P.
        • Brocki L.
        • Chung N.
        • Marchadour W.
        • Vermet F.
        • Gaubert L.
        • et al.
        Artificial intelligence: Deep learning in oncological radiomics and challenges of interpretability and data harmonization.
        Phys Med. 2021; 83: 108-121https://doi.org/10.1016/j.ejmp.2021.03.009
        • Castiglioni I.
        • Rundo L.
        • Codari M.
        • Di Leo G.
        • Salvatore C.
        • Interlenghi M.
        • et al.
        AI applications to medical images: From machine learning to deep learning.
        Phys Med. 2021; 83: 9-24https://doi.org/10.1016/j. ejmp.2021.02.006
        • Avanzo M.
        • Wei L.
        • Stancanello J.
        • Vallières M.
        • Rao A.
        • Morin O.
        • et al.
        Machine and deep learning methods for radiomics.
        Med Phys. 2020; 47: e185-e202https://doi.org/10.1002/mp.13678
        • Bos P.
        • Brekel M.
        • Taghavirazavizadeh M.
        • Gouw Z.
        • Al-Mamgani A.
        • Waktola S.
        • et al.
        Largest diameter delineations can substitute 3d tumor volume delineations for radiomics prediction of human papillomavirus status on mri’s of oropharyngeal cancer.
        Phys Med. 2022; 101: 36-43https://doi.org/10.1016/j.ejmp.2022.07.004
        • Yang Y.
        • Zheng B.
        • Li Y.
        • Li Y.
        • Ma X.
        Computer-aided diagnostic models to classify lymph node metastasis and lymphoma involvement in enlarged cervical lymph nodes using pet/ct.
        Med Phys. 2022; https://doi.org/10.1002/mp.15901
        • Ieko Y.
        • Kadoya N.
        • Sugai Y.
        • Mouri S.
        • Umeda M.
        • Tanaka S.
        • et al.
        Assessment of a computed tomography-based radiomics approach for assessing lung function in lung cancer patients.
        Phys Med. 2022; 101: 28-35https://doi.org/10.1016/j.ejmp.2022.07.003
        • Tang Y.
        • Che X.
        • Wang W.
        • Su S.
        • Nie Y.
        • Yang C.
        Radiomics model based on features of axillary lymphatic nodes to predict axillary lymphatic node metastasis in breast cancer.
        Med Phys. 2022; https://doi.org/10.1002/mp.15873
        • Zegers C.
        • Posch J.
        • Traverso A.
        • Eekers D.
        • Postma A.
        • Backes W.
        • et al.
        Current applications of deep-learning in neuro-oncological mri.
        Phys Med. 2021; 83: 161-173https://doi.org/10.1016/j.ejmp.2021.03.003
        • Vamvakas A.
        • Williams S.
        • Theodorou K.
        • Kapsalaki E.
        • Fountas K.
        • Kappas C.
        • et al.
        Imaging biomarker analysis of advanced multiparametric mri for glioma grading.
        Phys Med. 2019; 60C: 188-198https://doi.org/10.1016/j.ejmp.2019.03.014
        • Louis D.N.
        • Perry A.
        • Wesseling P.
        • Brat D.J.
        • Cree I.A.
        • FigarellaBranger D.
        • et al.
        The 2021 who classification of tumors of the central nervous system: a summary.
        Neuro Oncol. 2021; 23: 1231-1251https://doi.org/10.1093/neuonc/noab106
        • Bahar R.C.
        • Merkaj S.
        • Cassinelli Petersen G.I.
        • Tillmanns N.
        • Subramanian H.
        • Brim W.R.
        • et al.
        Machine learning models for classifying high- and low-grade gliomas: A systematic review and quality of reporting analysis.
        Front Oncol. 2022; 12https://doi.org/10.3389/fonc.2022.856231
        • Lambin P.
        • Rios Velazquez E.
        • Leijenaar R.
        • Carvalho S.
        • Stiphout R.
        • Granton P.
        • et al.
        Radiomics: Extracting more information from medical images using advanced feature analysis.
        Eur. J. Cancer (Oxford, England: 1990). 2012; 48: 441-446https://doi.org/10.1016/j.ejca.2011.11.036
        • Gillies R.
        • Kinahan P.
        • Hricak H.
        Radiomics: Images are more than pictures, they are data.
        Radiology. 2015; 278151169https://doi.org/10.1148/radiol.2015151169
        • Traverso A.
        • Wee L.
        • Dekker A.
        • Gillies R.
        Repeatability and reproducibility of radiomic features: A systematic review.
        Int J Radiat Oncol Biol Phys. 2018; : 102https://doi.org/10.1016/j.ijrobp.2018.05.053
        • Mitchell-Hay R.
        • Ahearn T.
        • Murray A.
        • Waiter G.
        Investigation of the interand intrascanner reproducibility and repeatability of radiomics features in t1-weighted brain mri.
        J Magn Reson Imaging. 2022; https://doi.org/10.1002/jmri.28191
        • Fornacon-Wood I.
        • Ackermann C.
        • Blackhall F.
        • Mcpartlin A.
        • Price G.
        • FaivreFinn C.
        • et al.
        Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform.
        Eur Radiol. 2020; : 30https://doi.org/10.1007/s00330-020-06957-9
        • Schwier M.
        • van Griethuysen J.
        • Vangel M.
        • Pieper S.
        • Peled S.
        • Tempany C.
        • et al.
        Repeatability of multiparametric prostate mri radiomics features.
        Sci Rep. 2019; : 9https://doi.org/10.1038/s41598-019-45766-z
        • Saltybaeva N.
        • Tanadini-Lang S.
        • Vuong D.
        • Burgermeister S.
        • Mayinger M.
        • Bink A.
        • et al.
        Robustness of radiomic features in magnetic resonance imaging for patients with glioblastoma: Multi-center study.
        Phys. Imaging Radiat. Oncol. 2022; 22: 131-136https://doi.org/10.1016/j.phro.2022.05.006
        • Hoebel K.
        • Patel J.
        • Beers A.
        • Chang K.
        • Singh P.
        • Brown J.
        • et al.
        Radiomics repeatability pitfalls in a scan-rescan mri study of glioblastoma.
        Radiology. Artif Intell. 2020; 3e190199https://doi.org/10.1148/ryai.2020190199
      1. Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A, et al. Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In Medical Imaging 2018: Computer-Aided Diagnosis; vol. 10575. SPIE. https://doi.org/10.1117/12.2293992.

        • Um H.
        • Tixier F.
        • Bermudez D.
        • Deasy J.
        • Young R.
        • Veeraraghavan H.
        Impact of image preprocessing on the scanner dependence of multi-parametric mri radiomic features and covariate shift in multi-institutional glioblastoma datasets.
        Phys Med Biol. 2019; : 64https://doi.org/10.1088/1361-6560/ab2f44
        • Clark K.W.
        • Vendt B.A.
        • Smith K.E.
        • Freymann J.B.
        • Kirby J.S.
        • Koppel P.
        • et al.
        The cancer imaging archive (tcia): Maintaining and operating a public information repository.
        J Digit Imaging. 2013; 26: 1045-1057https://doi.org/10.1007/s10278-013-9622-7
      2. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. Segmentation labels for the pre-operative scans of the tcga-gbm collection [data set]. 2017; https://doi.org/10.7937/K9/TCIA.2017.KLXWJJ1Q.

        • Bakas S.
        • Akbari H.
        • Sotiras A.
        • Bilello M.
        • Rozycki M.
        • Kirby J.S.
        • et al.
        Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features.
        Sci Data. 2017; 4: 170117https://doi.org/10.1038/sdata.2017.117
        • Menze B.
        • Jakab A.
        • Bauer S.
        • Kalpathy-Cramer J.
        • Farahaniy K.
        • Kirby J.
        • et al.
        The multimodal brain tumor image segmentation benchmark (brats).
        IEEE Trans Med Imaging. 2014; : 99https://doi.org/10.1109/TMI.2014.2377694
        • Van Griethuysen J.J.
        • Fedorov A.
        • Parmar C.
        • Hosny A.
        • Aucoin N.
        • Narayan V.
        • et al.
        Computational radiomics system to decode the radiographic phenotype.
        Cancer Res. 2017; 77: e104-e107https://doi.org/10.1158/0008-5472.CAN-17-0339
        • Zwanenburg A.
        • Valliè res M.
        • Abdalah M.A.
        • Aerts H.J.W.L.
        • Andrearczyk V.
        • Apte A.
        • et al.
        The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping.
        Radiology. 2020; 295: 328-338https://doi.org/10.1148/radiol.2020191145
      3. Docs pyradiomics: Radiomic features. https://pyradiomics.readthedocs.io/en/latest/features.html.

      4. Breiman L. Random forests. 1999.

        • Metz C.E.
        Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems.
        J Am Coll Radiol. 2006; 3: 413-422https://doi.org/10.1016/j.jacr.2006.02.021
        • Pedregosa F.
        • Varoquaux G.
        • Gramfort A.
        • Michel V.
        • Thirion B.
        • Grisel O.
        • et al.
        Scikit-learn: Machine learning in Python.
        J Mach Learn Res. 2011; 12: 2825-2830https://doi.org/10.48550/arXiv.1201.0490
        • Vallat R.
        Pingouin: statistics in python.
        J Open Source Software. 2018; 3(31):1026https://doi.org/10.21105/joss.01026
        • Liljequist D.
        • Elfving B.
        • Skavberg Roaldsen K.
        Intraclass correlation – a discussion and demonstration of basic features.
        PLoS One. 2019; 14: 1-35https://doi.org/10.1371/journal.pone.0219854
        • Koo T.K.
        • Li M.Y.
        A guideline of selecting and reporting intraclass correlation coefficients for reliability research.
        J Chiropr Med. 2016; 15: 155-163https://doi.org/10.1016/j.jcm.2016.02.012
        • Fatania K.
        • Mohamud F.
        • Clark A.
        • Nix M.
        • Short S.
        • O’Connor J.
        • et al.
        Intensity standardization of mri prior to radiomic feature extraction for artificial intelligence research in glioma—a systematic review.
        Eur Radiol. 2022; : 1-12https://doi.org/10.1007/s00330-022-08807-2
        • Isaksson L.J.
        • Raimondi S.
        • Botta F.
        • Pepa M.
        • Gugliandolo S.G.
        • De Angelis S.P.
        • et al.
        Effects of mri image normalization techniques in prostate cancer radiomics.
        Phys Med. 2020; 71: 7-13https://doi.org/10.1016/j. ejmp.2020.02.007
        • Molina-García D.
        • Pérez Beteta J.
        • Martínez-González A.
        • Martino J.
        • Velásquez C.
        • Arana E.
        • et al.
        Influence of gray level and space discretization on brain tumor heterogeneity measures obtained from magnetic resonance images.
        Comput Biol Med. 2016; : 78https://doi.org/10.1016/j.compbiomed.2016.09.011
        • Duron L.
        • Balvay D.
        • Vande Perre S.
        • Bouchouicha A.
        • Savatovsky J.
        • Sadik J.C.
        • et al.
        Gray-level discretization impacts reproducible mri radiomics texture features.
        PLoS One. 2019; 14: e0213459
        • Shiri I.
        • Hajianfar G.
        • Sohrabi A.
        • Abdollahi H.
        • Shayesteh S.
        • Geramifar P.
        • et al.
        Repeatability of radiomic features in magnetic resonance imaging of glioblastoma: Test-retest and image registration analyses.
        Med Phys. 2020; : 47https://doi.org/10.1002/mp.14368
        • Tian Q.
        • Yan L.F.
        • Zhang X.
        • Zhang X.
        • Hu Y.C.
        • Han Y.
        • et al.
        Radiomics strategy for glioma grading using texture features from multiparametric mri: Radiomics approach for glioma grading.
        J Magn Reson Imaging. 2018; : 48https://doi.org/10.1002/jmri.26010
        • Ortiz-Ramón R.
        • Ruiz-España S.
        • Mollá-Olmos E.
        • Moratal D.
        Glioblastomas and brain metastases differentiation following an mri texture analysis-based radiomics approach.
        Phys Med. 2020 Aug; 76: 44-54https://doi.org/10.1016/j.ejmp.2020.06.016
      5. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection [data set] 2017; https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF.