Advertisement
Research Article| Volume 110, 102591, June 2023

Download started.

Ok

An interpretable feature-learned model for overall survival classification of High-Grade Gliomas

Published:April 29, 2023DOI:https://doi.org/10.1016/j.ejmp.2023.102591

      Highlights

      • Proposed unified framework for glioma survival classification & visual interpretation.
      • Proposed an integrated modality-specific and modality-concatenated system.
      • Presented in-depth analysis of designing the classification network.
      • Generation of attention maps over different convolutional levels & non-predicted labels.

      Abstract

      Purpose

      An accurate and well-defined survival prediction of High Grade Gliomas (HGGs) is indispensable because of its high incidence and aggressiveness. Therefore, this paper presents a unified framework for fully automatic overall survival classification and its interpretation.

      Methods and materials

      Initially, a glioma detection model is utilized to detect the tumorous images. A pre-processing module is designed for extracting 2D slices and creating a survival data array for the classification network. Then, the classification pipeline is integrated with two separate pathways: a modality-specific and a modality-concatenated pathway. The modality-specific pathway runs three separate CNNs for extracting rich predictive features from three sub-regions of HGGs (peritumoral edema, enhancing tumor and necrosis) by using three neuro-imaging modalities. In these pathways, the image vectors of the different modalities are also concatenated to the final fusion layer to overcome the loss of lower-level tumor features. Furthermore, to exploit the intra-modality correlations, a modality-concatenated pathway is also added to the classification pipeline. The experiments are conducted on BraTS 2018 and BraTS 2019 benchmarks, demonstrating that the proposed approach performs competitively in classifying HGG patients into three survival groups, namely, short, mid, and long survivors.

      Results

      The proposed approach achieves an overall classification accuracy, sensitivity, and specificity of about 0.998, 0.997, and 0.999, respectively, for the BraTS 2018 dataset, and for BraTS 2019, these values correspond to 1.000, 0.999, and 0.999.

      Conclusions

      The results indicate that the proposed model achieves the highest values of the evaluation metrics for the overall survival classification of HGG.

      Graphical abstract

      Keywords

      1. Introduction

      Gliomas are the most prevalent type of brain malignancies and are categorized as High grade glioma (HGG) and lower grade glioma (LGG) according to their pathological evaluation [
      • Inda M.
      • Bonavia R.
      • Seoane J.
      Glioblastoma multiforme: A look inside its heterogeneous nature.
      ]. Glioblastoma Multiforme (GBM), a grade IV glioma, is mainly characterized by high heterogeneity, mortality, recurrence and lower survival chances [
      • Chaddad A.
      • Daniel P.
      • Desrosiers C.
      • Toews M.
      • Abdulkarim B.
      Novel Radiomic Features Based on Joint Intensity Matrices for Predicting Glioblastoma Patient Survival Time.
      ]. Despite excessive treatment methods, the survival of GBM patients remains dismal, and their median survival is typically around 14 to 17 months [
      • Naser M.A.
      • Deen M.J.
      Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images.
      ]. Therefore, timely and robust overall survival prediction is of utmost value in featuring diagnosis and planning effective treatment methodologies for GBM patients. The assessment of the survival time of GBM is conducted using Magnetic resonance imaging (MRI), the most predominantly used imaging technique along with its four multi-modalities [
      • Barbieri M.
      • Brizi L.
      • Giampieri E.
      • Solera F.
      • Manners D.N.
      • Castellani G.
      • et al.
      A deep learning approach for magnetic resonance fingerprinting: Scaling capabilities and good training practices investigated by simulations.
      ]. The four MRI modalities, namely, T1, contrast enhanced T1 (T1-CE), T2-weighted, and Fluid Attenuation Inversion Recovery (FLAIR), deliver distinct phenotype and tumor characteristics which promotes the evaluation of survival times of GBM patients [
      • Zegers C.M.L.
      • Posch J.
      • Traverso A.
      • Eekers D.
      • Postma A.A.
      • Backes W.
      • et al.
      Current applications of deep-learning in neuro-oncological MRI.
      ]. Moreover, the multi-view capability of these modalities in the different planes namely, axial, coronal and sagittal highlights the distinct sub-constituents of GBM and elevates the prediction performance.
      Prediction models can usefully be divided into three types, based on the method used for feature extraction: (i) handcrafted, (ii) deep learning, (iii) a hybrid of (i) and (ii). The first category belongs to the traditional way of extracting handcrafted features from MRI modalities with prognostic significance. These features are explicitly designed and incorporate first-order statistics, tumor shape, geometry and various textural properties [
      • Aurna N.F.
      • Yousuf M.A.
      • Taher K.A.
      • Azad A.K.M.
      • Moni M.A.
      A classification of MRI brain tumor based on two stage feature level ensemble of deep CNN models.
      ]. Though the total count of these features may approach tens of thousands, these handcrafted features are still of low-order and lack full characterization of GBM heterogeneity.
      On the other hand, it is quite challenging to further optimize the feature extraction and feature reduction algorithms [
      • Manco L.
      • Maffei N.
      • Strolin S.
      • Vichi S.
      • Bottazzi L.
      • Strigari L.
      Basic of machine learning and deep learning in imaging for medical physicists.
      ]. In the second category, the overall survival can be assessed with deep learning-based models that are capable of achieving high-order and discriminative image features. Whereas the hybrid combination of handcrafted, deep features and some clinical parameters are concatenated to form a feature vector to enhance the prediction capability. This method falls under the third category.
      Also, the mentioned approaches for survival prediction can either be 2D or 3D, depending upon the applicability. Although the strategies based on deep learning have potentially mimicked human perception, due to the non-transparent nature of these strategies, it is difficult to understand how a particular result is obtained. Moreover, it is quite challenging to convince healthcare professionals to approve the predictions of CNNs and they frequently question the underlying process flow of the results. Therefore, in medical applications like predicting patient survival, it is essential to perceive the rationale behind the network’s prediction and to discover the reasoning behind these results [
      • Castiglioni I.
      • Rundo L.
      • Codari M.
      • Di Leo G.
      • Salvatore C.
      • Interlenghi M.
      • et al.
      AI applications to medical images: From machine learning to deep learning.
      ].
      Explainable AI is proven to be a boon for interpreting the black-box nature of deep learning models and has gained significant interest in medical analysis communities [
      • Castiglioni I.
      • Rundo L.
      • Codari M.
      • Di Leo G.
      • Salvatore C.
      • Interlenghi M.
      • et al.
      AI applications to medical images: From machine learning to deep learning.
      ]. This interpretability enables researchers to recognize the patterns acquired by the model and validate if they are consistent with the expertise of the healthcare professionals [
      • Papadimitroulas P.
      • Brocki L.
      • Christopher Chung N.
      • Marchadour W.
      • Vermet F.
      • Gaubert L.
      • et al.
      Artificial intelligence: Deep learning in oncological radiomics and challenges of interpretability and data harmonization.
      ].
      The interpretability process is classified into two categories, i.e, trainable attention and post-hoc attention [
      • Zeineldin R.A.
      • et al.
      Explainability of deep neural networks for MRI analysis of brain tumors.
      ]. Trainable attention generates attention maps at the time of model training, whereas generating the attention maps after training or on a trained model while fixing the model parameters is a post-hoc attention method. Another way of visualizing the model’s behavior is to obtain feature maps of a few internal layers and try to decipher the information flow from low-end to high-end. Class Activation Maps (CAM) [
      • Singh A.
      • Sengupta S.
      • Lakshminarayanan V.
      Explainable deep learning models in medical image analysis.
      ] and Gradient weighted Class Activation Maps (Grad-CAM) [

      Banerjee S, Mitra S, Shankar BU. Automated 3D segmentation of brain tumor using visual saliency. Inf. Sci. (Ny).2018;424:337–353. 10.1016/j.ins.2017.10.011.

      ] are popular approaches for obtaining heatmaps via gradient calculation to unveil the hidden details of convolutional layers.
      The central insight of this paper is to achieve a highly reliable and automatic overall survival prediction model for HGG patients while addressing its four main challenges: (1) Most of the existing prediction models are 3D, which do not take into consideration the small size of the different series of BraTS databases. Also, it is non-viable to extract the full potential of 3D deep learning models in case of a smaller sample size. (2) Many imaging-based prognosis studies deployed traditional machine learning algorithms that could not fully extract rich predictive patterns embedded in the multi-modal neuroimages. (3) Untill now, only single or dual pathway models are employed for HGG survival prediction task limiting full exploration of multiple modalities. Therefore, employing multi-pathway CNN-based prediction framework will be beneficial to extract more abstract and high-level details. (4) Any of the survival prediction works have not generated visual explanations of the overall survival classification results. Hence, integrating the visual interpretability framework with the classification task is a prerequisite in highlighting regions responsible for survival prediction.
      Based on the above-aforementioned challenges, our complete interpretable survival prediction framework benefits from the following multi-fold contributions.
      • An end-to-end unified framework for overall survival classification and visual interpretation is introduced.
      • An integrated modality-specific and modality-concatenated system is proposed which incorporate the benefits of both. A modality-specific pathway is adopted for each MRI modality to independently acquire the full characterization of important regions. For including intra-modality correlations, a modality-concatenated pathway is also used.
      • Modifications in the existing multi-path and single-path classification models are also highlighted.
      • An in-depth analysis of designing and evaluation of the classification network is also presented.
      • Interpretation and the validation of the classification outcomes by generating attention maps over different convolutional levels and non-predicted labels.
      • This is the first study that provides visual interpretability of the overall survival classification model, outperforming the existing classification approaches significantly.
      The remainder of the paper is structured as follows: Section 2 highlights previous and ongoing research in the field of HGG survival prediction. Section 3 discusses the methodology adopted for the classification and interpretation of the overall survival of HGG. Section 4 depicts the experimental results and analysis. Lastly, the conclusions of the work are discussed in Section 5.

      2. Related work

      Overall survival prediction of glioma patients from multi-modal MRIs has achieved widespread recognition in the last decade. According to the literature, most of the glioma survival prediction analysis is usually based on the extraction of radiomics features from the input MRI. Puybareau et al. [
      • Puybareau E.
      • Tochon G.
      • Chazalon J.
      • Fabrizio J.
      Segmentation of gliomas and prediction of patient overall survival: a simple and fast procedure.
      ] developed a feature-based survival prediction system which is based on the extraction of location and size features of the tumor region while achieving only 61% of the overall accuracy. Huang et al. [
      • Huang H.
      • Zhang W.
      • Fang Y.
      • Hong J.
      • Su S.
      • Lai X.
      Overall Survival Prediction for Gliomas Using a Novel Compound Approach.
      ] proposed a hybrid combination of radiological and deep-learning based features to predict the survival outcomes. The authors typically exploited wavelet, shape, texture and intensity-based features along with a CNN deep feature extractor. Some of the other recent glioma survival prediction works are highlighted in Table 1.
      Table 1A brief note on the recently employed feature-based and feature-learned methods for overall survival classification of HGG.
      AuthorApproachTypeDeep Learning ArchitectureDataset & Performance metricsPros/Cons
      Bhadani et al.
      • Bhadani S.
      • Mitra S.
      • Banerjee S.
      Fuzzy volumetric delineation of brain tumor and survival prediction.
      Feature-based3D__Local dataset 29 patients.

      Accuracy: 68.4%
      Utilized only volume- based features, extremely small size dataset.
      Lao et al.

      Lao J, et al. A Deep Learning-Based Radiomics Model for Prediction of Survival in Glioblastoma Multiforme. Sci. Rep.;7:1–8. 10.1038/s41598-017-10649-8.

      Feature-based + Feature-learned2DPretrained CNNLocal dataset 112 patients

      C-index = 0.710
      Less exploration of better deep learning architectures and evaluation criteria.
      Huang et al.
      • Huang H.
      • Zhang W.
      • Fang Y.
      • Hong J.
      • Su S.
      • Lai X.
      Overall Survival Prediction for Gliomas Using a Novel Compound Approach.
      Feature-based + Feature-learned3DSingle path CNNBraTS 2019, BraTS 2020

      RMSE 311.5
      Could not explore the ways to overcome the loss of spatial content because of 3D max-pooling.
      Pei et al.
      • Pei L.
      • Vidyaratne L.
      • Rahman M.M.
      • Iftekharuddin K.M.
      Context aware deep learning for brain tumor segmentation, subtype classification, and survival prediction using radiology images.
      Feature-learned3DSingle path CNNBraTS 2019

      Accuracy: 58%
      No modifications presented in the CNN for achieving better classification outcomes.
      Banerjee et al.
      • Banerjee S.
      • Mitra S.
      • Shankar B.U.
      Multi-planar spatial-ConvNet for segmentation and survival prediction in brain cancer.
      Feature-based + Feature-learned2DMultilayer Perceptron with 2 hidden layersBraTS 2018

      Accuracy: 58%, MSE: 180959.4
      Low model performance in comparison to other 2D methods.
      Nie et al.
      • Nie D.
      • Lu J.
      • Zhang H.
      • Adeli E.
      • Wang J.
      • Yu Z.
      • et al.
      Multi-Channel 3D Deep Feature Learning for Survival Time Prediction of Brain Tumor Patients Using Multi-Modal Neuroimages.
      Feature-learned + SVM3DFour path CNNLocal dataset 29 patients.

      Accuracy: 90.66%
      Achieved better classification results.
      Puybareau et al.
      • Puybareau E.
      • Tochon G.
      • Chazalon J.
      • Fabrizio J.
      Segmentation of gliomas and prediction of patient overall survival: a simple and fast procedure.
      Feature-based2D__BraTS 2018

      Accuracy: 61%
      Focused only on extracting brain tumor location and its size
      Fu et al.
      • Fu X.
      • Chen C.
      • Li D.
      Survival prediction of patients suffering from glioblastoma based on two-branch DenseNet using multi-channel features.
      Feature-learned2DDual path CNNBraTS 2018

      Accuracy: 94%
      Better accuracies obtained but could not explore the shortcomings of the dual-path CNN models.
      Kao et al.
      • Kao P.Y.
      • Ngo T.
      • Zhang A.
      • Chen J.W.
      • Manjunath B.S.
      Brain tumor segmentation and tractographic feature extraction from structural MR images for overall survival prediction.
      Feature-based3D__BraTS 2018

      Accuracy: 70%
      Utilized a different approach by extracting tractographic features.
      Mossa et al.
      • Mossa A.A.
      • Çevik U.
      Ensemble learning of multiview CNN models for survival time prediction of brain tumor patients using multimodal MRI scans.
      Feature-learned2DEnsemble of six CNNsBraTS 2017

      Accuracy: 92.9%
      Computationally expensive approach.
      But none of these methodologies have studied the underlying relationship of integrating a multi-modality path and a single path with different modalities concatenated as channels. Furthermore, according to our findings, none of the mentioned works in Table 1 have highlighted the visual interpretability for the overall survival prediction of HGG, but a few reported visual explanations of the deep learning models in glioma segmentation tasks only. Zeineldin et al. [
      • Zeineldin R.A.
      • et al.
      Explainability of deep neural networks for MRI analysis of brain tumors.
      ] implemented various interpretability methods like Vanilla gradient [
      • Reyes M.
      • Meier R.
      • Pereira S.
      • Silva C.A.
      • Dahlweid F.-M.
      • Tengg-Kobligk H.V.
      • et al.
      On the interpretability of artificial intelligence in radiology: Challenges and opportunities. Radiology.
      ], Guided backpropagation (GBP), Grad-CAM, Guided Grad-CAM and Smooth-Grad for tumor segmentation and tumor-grade classification of BraTS 2019 dataset. According to their findings, the heatmaps generated by Grad-CAM and Guided Grad-CAM methods were the most interpretable as these maps were the least noisy. Also, the GBP method only highlighted the boundaries of the different glioma sub-regions instead of underlying the complete tumor area, making it less explainable.
      Saleen et al. [
      • Saleem H.
      • Shahid A.R.
      • Raza B.
      Visual interpretability in 3D brain tumor segmentation network.
      ] produced 3D saliency maps using a Grad-CAM extension for the segmentation of the glioma using BraTS 2018 dataset. The authors also identified the significance of each MRI modality in glioma segmentation using visual explanations for their implemented model. Moreover, they proposed a deletion metric for the quantitative estimation of the interpretability task. So, incorporating the visual interpretability pipeline and the classification system in a unified structure will motivate experienced professionals to gain trust in these Computer-aided Design (CAD) based systems.

      3. Proposed method

      The complete framework for the interpretable survival prediction task is highlighted in Fig. 1. The proposed framework is integrated with four blocks namely, pre-processing modules, glioma detection, an overall survival classification pipeline and a visual interpretability pipeline. The pre-processing modules are utilized for converting 3D MRI volumes to 2D arrays and forming a survival data array. The glioma detection module is utilized to detect the tumor slices from the complete datasets. The overall survival classification pipeline combines two pathways, i.e., a modality-specific and a modality-concatenated pathway with the distinct ways of extracting survival information from MRI modalities. The working of each block is explained in detail as follows:
      Figure thumbnail gr1
      Fig. 1A feature-learned framework and its visual interpretability for the overall survival classification of HGG patients. The input to the complete model is a set of 3D multi-modal MRI modalities and survival information. The output of the classification pipeline is the predicted class and output of the interpretability pipeline is the generated visual explanations of classification results. The overall survival classification pipeline is comprised of modality-specific and modality-concatenated streamlines. The last convolutional layer of the modality-concatenated pathway defines the visual interpretability pipeline.

      3.1 Pre-processing modules and glioma detection

      Since the dataset is in the Neuroimaging Informatics Technology Initiative (NIfTI) format, 2-dimensional slices are extracted independently from the different MRI 3D volumes using the nibabel package. The unimportant or background pixel information is eliminated so that the neural network can learn essential features of the brain and tumor regions. This is accomplished by selecting pixel intensities of the coordinates from 40 to 200 resulting in image size of 160 × 160. Then Z-Score normalization is conducted to enhance MRI data uniformity and overcome undesirable artifacts in modalities. This approach produces zero mean and a variance of one for each MRI modality. Subsequently, a glioma detection model is employed to detect the tumorous slices and ignore the healthy ones. The detection model utilized here is the same as the classification model. The only change in the detection model is that the age vector is not included, and the filter in the last dense layer is kept as one. The detection step is included because in real-life scenarios, there is no guarantee that the model can always classify the images based on the actual tumoral regions. After this, the tumorous slices selected from the glioma detection system are further fed to the classification system along with the output of pre-processing module 2.
      In the pre-processing module 2, the array of overall survival prediction value is created by assigning survival information of that particular patient to each of the detected slice. Following this, label encoding is implemented by assigning labels as depicted in Fig. 2. This survival data array and all modalities are further passed to the next block i.e., overall survival prediction pipeline. Here the thresholds of 300 and 450 are chosen to demarcate the three survival classes as defined in the BraTS 2018 and BraTS 2019 challenges [,
      • Pei L.
      • Vidyaratne L.
      • Rahman M.M.
      • Iftekharuddin K.M.
      Context aware deep learning for brain tumor segmentation, subtype classification, and survival prediction using radiology images.
      ].
      Figure thumbnail gr2
      Fig. 2The outline of the pre-processing module 1 and 2. The input is the 3D volumes of FLAIR, T2, T1 and T1-CE and the survival information of each patient. The output are the extracted 160 × 160 2D images and a survival data array containing the survival class of each slice. The Z-score normalization is performed only for the modalities. The Green colored block represents the input data. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

      3.2 Overall survival classification pipeline

      Deep-learning based survival prediction models are comprised of stacked convolutional and dense layers which represent a hierarchy of features ranging from low-level to high-level. But most of these feature-learned works employ multi-pathways, one for each respective modality and concatenate the multi-modal features into a single vector at the end. This typically helps in obtaining modality-specific feature representations. But these multi-pathway models could not consider the wide gap between lower-level and higher-level feature sets. Another type of feature-learned model employs single-path with concatenated modalities to gain correlations among different modalities. But these methods may underperform when training data is small in size and even lacks modality specified features. Considering these downsides, an overall survival classification framework is proposed with a hybrid modality-specific and modality-concatenated pathways for a more sophisticated representation of predictive features. The details are discussed as follows:

      3.2.1 Modality-specific pathway

      HGG typically possesses three sub-constituents, which have been investigated with utmost significance in predicting the overall survival time of HGG patients [
      • Yang D.
      • Rao G.
      • Martinez J.
      • Veeraraghavan A.
      • Rao A.
      Evaluation of tumor-derived MRI-texture features for discrimination of molecular subtypes and prediction of 12-month survival status in glioblastoma.
      ]. These sub-constituents of HGG represent high morphological complexities and can be easily distinguishable from different MRI modalities. Each modality facilitates the detection of unique attributes of some regions of the tumor. Therefore, it is essential to employ multiple pathways for each MRI modality to unravel the high-level features of the different tumor parts.
      In the proposed model, three distinct paths for FLAIR, T1-CE and T2 are deployed as these modalities provide the best information for the tumor regions. These modalities are then individually passed through a series of convolutional operations to generate a modality-specific feature vector. Firstly a 3 × 3 convolutional layer with 8 filters is included, followed by a batch normalization and a ReLU activation function. The subsequent two blocks comprise 16 and 32 as the total number of filters for convolutional layers. Also, a spatial dropout layer is utilized instead of a standard dropout as spatial dropout eliminates a complete 2D feature map if the pixels in the neighbouring feature maps are closely correlated.
      In the existing multi-pathway models, the features from the multi-modal paths are concatenated at the end of the network’s layers. But the main limitation of these models is that, they don’t combine the low-level feature representations that provide many tumor details like edges or blobs [

      Tang W, Zhang H, Yu P, Kang H, Zhang R. MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients. arXiv Prepr. arXiv2206.06267, 2022.

      ]. Moreover, these low-level features may also contain some small tumor sub-structures that are need to be preserved for survival feature extraction. Hence to overcome the pitfall of these models, the input arrays for three modalities are flattened individually and are concatenated to the processed feature vector. Then, this vector merged with the local and global level information, is finally allowed to process through the subsequent layers. Hence, both low-level and high-level relevance is reflected through this operation effectively.

      3.2.2 Modality-concatenated pathway

      In the modality-concatenated pathway, FLAIR, T1-CE, T2, and T1 are concatenated as channels in a single input for extracting high-dimensional and significant predictive patterns. This concatenation of the different modalities is essential to gain intra-modality correlation and helps in extracting the survival-related features from the four modalities collectively. Similar to the modality-specific pathway, the input vector is processed through a few convolutional layers with a filter size of 8, 16, and 32, respectively. These deep learning-based operations are performed on the multiple 2D channels to collectively extract the multi-modal information. At the end of the network, the flattened feature vectors from modality-specific and modality-concatenated pathways are aggregated to form a complete high-dimensional feature set. Later, the age vector is also integrated to perform the final survival class analysis, i.e., short, mid, and long.
      In accordance with tang et al. [

      Tang W, Zhang H, Yu P, Kang H, Zhang R. MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients. arXiv Prepr. arXiv2206.06267, 2022.

      ], these modality-concatenated networks may underperform if the training data is limited in size. To cope with this, 2D slices are extracted from 3D volumes to mitigate the small sample size in the training data as explained in pre-processing module. The other modification is that instead of using a max-pooling layer, a stride-2 convolutional layer is used. Every second and third convolutional layer in both modality-specific and modality-concatenated networks is a strided-2 convolutional layer. This replacement is essential because, in max-pooling operation, the exact location of a feature is less important whereas strided convolution serves to detect multiple patterns in input with receptive fields [
      • Malhotra R.
      • Saini B.S.
      • Gupta S.
      A novel compound-based loss function for glioma segmentation with deep learning.
      ]. Also, replacing max-pooling layer with strided convolution impacts no overall loss in the accuracies on several image-related tasks.

      3.3 Visual interpretability pipeline

      The integrated visual interpretability pipeline is also highlighted in Fig. 1, and its working is explained in Algorithm 1. As the input modalities are forward propagated through the classification network, the visual explanations are produced to interpret the result of the deep learning-based classification network. As the interpretability approach is post-hoc, i.e., visual explanations are generated on the trained model, which will not impact the classification accuracies. For this, the activation maps of the last convolutional layer of the modality-concatenated network are extracted and the gradients of the top predicted class are computed with respect to the activation maps of the last convolutional layer. In the gradient-based approach, the visualizations of all other predicted classes, and the top ones are also incorporated. This is performed by selecting the value of the output neuron with respect to the particular class, we want to visualize. The output neuron is selected from the resultant vector obtained from the classification network, which specifies the probabilities for each survival class. Though generating heatmaps or saliency maps of the top predicted class will help understand the rationale behind the model’s predictions, comparing these heatmaps with non-predicted classes gives a valid proof of model’s correct behavior.
      After the gradient computation, the gradient matrix is reduced across each feature map channel to produce a vector of size equal to the number of feature maps of the convolutional layer to be visualized. Following this, each entry of the reduced gradient vector is multiplied individually with the activation maps of the last convolutional layer for weighting the importance of each activation map. Then their weighted sum is obtained. Since the dimensions of the generated map are the same as that of the last layer’s activation map, it is resized to the input image size. Then the values greater than 0 are only considered, and finally, for visualization purposes, min–max scaling is done. Hence, obtaining heatmaps maps will provide a better to interpret the important regions for survival classification and also validate the working of the proposed classification approach.
      Tabled 1
      Algorithm 1:
      Visual Interpretability:
      C_output = Activation maps of the last convolutional layer to be visualized.
      G_Cam = Array of dimension C_output[0:2]
      PredT = A tensor of prediction probabilities for all classes
      T = index of the maximum probability in PredT
      O_neuron = PredT [:, T] #output value of the top predicted class for the Input
      For generating heatmaps of short survivor class:
         O_neuron = PredT [:, 0] #extracted output value related to class 1
      For generating heatmaps of mid survivor class:
         O_neuron = PredT [:, 1] #extracted output value related to class 2
      For generating heatmaps of long survivor class:
         O_neuron = PredT [:, 2] #extracted output value related to class 3
      C_grad = Gradients (O_neuron, C_output) #gradient calculation of the top predicted class with respect to C_output
      W_grad = Mean (C_grad, axis = (0,1)) #averaging the gradients spatially
      For Loop:
      For index in W_grad do
          G_Cam += W_grad[index] * C_output[index] #element wise multiplication and weighted sum of the multiplication
      End For loop
      G_Cam = resize(G_Cam, (160,160)) #resizing heatmaps to the image dimension
      G_Cam = ReLU (G_Cam) #eleminating values below 0
      G_Cam = (G_Cam-G_Cam.min())/(G_Cam.max()-G_Cam.min()) #min–max normalization
      Hcam = Visualize(G_Cam) #visualizing heatmaps

      4. Experiments and results

      4.1 Data and implementation details

      The classification results are evaluated on BraTS 2018 and BraTS 2019 benchmarks which are available online with multi-modal MRI images and survival information. BraTS 2018 and BraTS 2019 comprise a total of 163 and 210 HGG patients, respectively with survival information. Each of these HGG patients also includes a set of four MRI modalities (FLAIR, T1, T1-CE, T2) represented in 3D volumetric niftii format. The dataset visualizations before and after pre-processing for both datasets are highlighted in Fig. 3(a) and Fig. 4(a). Initially, the scatter plot represents the distribution of the overall survival time for each patient in short, mid and long survival group.
      Figure thumbnail gr3
      Fig. 3Data visualization in BraTS 2018 (a) Scatter plot representation of the number of patients in each survival class before pre-processing. Short survivors: 65, Mid survivors: 42, Long survivors: 56 (b) Bar representation of number of samples in each survival class after pre-processing. Short survivors: 4593 Mid survivors: 2846, Long survivors: 3425. ‘Sample’ here signifies a single image slice.
      Figure thumbnail gr4
      Fig. 4Data visualization in BraTS 2019 (a) Scatter plot representation of the number of patients in each survival class before pre-processing. Short survivors: 81, Mid survivors: 54, Long survivors: 75 (b) Bar representation of number of samples in each survival class after pre-processing. Short survivors: 5605, Mid survivors: 3568, Long survivors: 4637. ‘Sample’ here signifies a single image slice.
      After the extraction of 2D slices, the bar representation of the total number of samples in each category class is represented in Fig. 3(b) and Fig. 4(b). After pre-processing the number of samples for each survival class has been increased.
      The overall survival classification is attained using Python 3.7 with Keras and tensorflow as backend, Xeon-3.7 GHz, 128 GB RAM, with CUDA 11.2, CuDNN 8.1 and 16 GB NVIDIA Quadro RTX 5000 GPU. For the evaluation of the classification results, a batch-size of 10, 20 epochs and adam optimizer are used along with a 5-fold cross validation technique.

      4.2 Evaluation metrics

      To examine the performance of the overall survival classification task, accuracy, sensitivity, and specificity are employed as quantitative evaluation metrics. The detailed description of the metrics is provided in the references [, ].

      4.3 Result analysis

      In this section, firstly, an ablation study for designing the network configuration is conducted with result visualizations using radar plots. Secondly, the attention maps are generated by utilizing a gradient-based approach for better classification interpretability in two ways. In the first approach, attention over different convolutional layers is studied to understand the top–bottom learning approach of the deep learning-based model. For the second approach, attention of the classification model over predicted and non-predicted labels is also evaluated. Finally, the proposed work is compared with the recent methodologies on both datasets.

      4.3.1 Ablation study for designing the network architecture for overall survival classification

      To verify the influence of the proposed classification architecture, a series of ablation experiments have been conducted on both datasets. We initially obtained the detection accuracy, sensitivity, and specificity of the glioma detection system as 100% and then employed the classification system for the tumor-detected slices. For the classification model, a single path in the modality-specific pathway is constructed to individually observe the classification metrics with FLAIR, T1-CE, T2, and T1 modalities. For 2nd architecture, a dual pathway model is formed while incorporating all the possible pair-wise combinations. For 3rd architecture, three different pathways are considered (with all possible combinations) to determine the survival class for HGG. In the 4th architecture, four different FLAIR, T1-CE, T2, and T1 pathways are formed to evaluate their impact on the classification accuracies. For the 5th architecture, two modality-concatenated pathways are examined. In the first modality-concatenated pathway, only FLAIR, T1-CE, and T2 modalities are concatenated, whereas, in the second one, all modalities are concatenated to form a 4-channel image. After this, the best among (1st, 2nd, 3rd, 4th) and 5th architectures is considered and is integrated into a single pipeline to obtain the 6th architecture. Finally, the inclusion of the image vectors to the 6th architecture is done for the proposed architecture.
      As highlighted in Table 2, Table 3, Table 4, Table 5, FLAIR gives the top results for a single path, whereas for dual and three paths, FLAIR + T1-CE and FL + T1-CE + T2 produce the best classification outcomes, respectively. Also, the results of the four-path system are not much impressive. Therefore, only FLAIR, T1-CE, and T2 modalities are considered for the modality-specific pathway. Also, the concatenation of FLAIR, T1-CE, T2, and T1 performs better than the FLAIR, T1-CE, and T2 concatenation for the modality-concatenated pathway. Hence for the proposed system, the combined network (modality-specific and modality concatenated) is included with the image vectors in the modality-specific pathway. Also, a 5-fold cross-validation scheme is utilized for all the above mentioned experiments.
      Table 2Training result evaluation of various architectures on Dataset 2018.
      DesignAccuracySensitivitySpecificity
      Class 1Class 2Class 3OverallClass 1Class 2Class 3OverallClass 1Class 2Class 3Overall
      Single pathFL0.8700.9010.8880.8860.8450.9050.8540.8610.8820.9230.8510.885
      T1-CE0.8360.8300.8290.8320.8410.8350.8320.8360.8460.8310.8330.837
      T20.8100.8000.8080.8060.8210.8190.8120.8170.8140.8130.8210.816
      T10.7210.7100.7250.7190.7300.7250.7110.7220.7230.7140.7220.720
      Dual pathFL + T1-CE0.8950.8880.8930.8920.8950.9000.9010.8990.9020.8910.8920.895
      FL + T10.8920.8800.8910.8880.8960.8710.8830.8830.8900.8820.8880.887
      T1 + T20.8720.8680.8700.8700.8860.8610.8640.8700.8730.8690.8700.871
      FL + T20.8790.8600.8690.8690.8740.8600.8610.8650.8660.8620.8700.866
      T1-CE + T20.8510.8420.8460.8460.8590.8550.8410.8520.8400.8510.8480.846
      T1 + T1-CE0.8330.8290.8280.8300.8300.8280.8310.8300.8330.8200.8190.824
      Three pathFL + T1-CE + T20.9500.9420.9430.9450.9450.9380.9510.9450.9330.9240.9410.933
      T1 + T1-CE + T20.9320.9330.9330.9330.9270.9200.9320.9260.9370.9290.9330.933
      FL + T1 + T20.9330.9210.9220.9250.9300.9300.9290.9300.9210.9280.9300.926
      FL + T1-CE + T10.9110.9220.9190.9170.9100.9240.9120.9150.9110.9000.9090.907
      Four pathT1 + T2+T1-CE + FL0.9490.9370.9310.9390.9330.9210.9390.9310.9310.9260.9310.929
      Modality concatenatedFL + T2+T1-CE0.9110.8940.9010.9020.9510.8820.8510.8940.9000.8910.9110.901
      FL + T2+T1-CE + T10.9200.9190.9220.9200.9190.8990.8790.8990.8990.9120.8950.902
      Combined network0.9790.9710.9760.9750.9990.9340.9690.9670.9880.9570.9710.972
      Combined network with image vector (Proposed)0.9990.9970.9980.9980.9960.9980.9990.9970.9990.9980.9970.999
      5-fold cross validated results are represented.
      Combined network = Three path + modality concatenated (FL + T2 + T1-CE + T1).
      Proposed = Combined network + image vectors.
      Table 3Testing result evaluation of various architectures on Dataset 2018.
      DesignAccuracySensitivitySpecificity
      Class 1Class 2Class 3OverallClass 1Class 2Class 3OverallClass 1Class 2Class 3Overall
      Single pathFL0.8600.8800.8550.8750.8320.8850.8420.8530.8430.8900.8250.885
      T1-CE0.8290.8310.8190.8260.8290.8310.8300.8300.8260.8210.8220.823
      T20.8000.7810.7890.7900.8110.8100.8020.8080.8160.8010.7910.803
      T10.7200.7000.7150.7120.7220.7210.7000.7140.7110.7090.7290.716
      Dual pathFL+T1-CE0.8850.8690.8730.8760.8650.8990.8990.8880.9090.8710.8730.884
      FL+T10.8910.8700.8800.8800.8860.8690.8800.8780.8890.8800.8780.882
      T1+T20.8700.8480.8670.8620.8660.8710.8530.8630.8590.8490.8610.856
      FL+T20.8690.8500.8480.8560.8630.8590.8510.8580.8660.8620.8700.866
      T1-CE+T20.8510.8420.8460.8460.8590.8550.8410.8520.8400.8440.8470.844
      T1+T1-CE0.8340.8180.8270.8260.8260.8210.8250.8240.8230.8110.8090.814
      Three pathFL+T1-CE+T20.9490.9410.9330.9410.9330.9370.9330.9430.9300.9130.9390.927
      T1+T1-CE+T20.9220.9330.9270.9270.9110.9160.9290.9190.9200.9190.9210.920
      FL+T1+T20.9350.9110.9190.9220.9210.9240.9210.9220.9010.9140.9200.912
      FL+T1-CE+T10.9000.9200.9090.9100.9110.9210.9020.9110.9000.8900.9000.897
      Four pathT1+T2+T1-CE+FL0.9270.9200.9150.9210.9250.9000.9110.9120.9190.9050.9150.913
      Modality concatenatedFL+T2+T1-CE0.9050.8910.8910.8950.9230.8890.8310.8810.8990.8870.9000.895
      FL+T2+T1-CE+T10.9090.9000.8600.9060.9000.8770.8890.8990.8790.8920.8510.887
      Combined network0.9640.9680.9550.9450.9910.9140.9510.9520.9530.9390.9620.959
      Combined network with image vector (Proposed)0.9870.9820.9980.9890.9970.9880.9880.9970.9990.9970.9870.999
      5-fold cross validated results are represented.
      Combined network = Three path + modality concatenated (FL + T2 + T1-CE + T1).
      Proposed = Combined network + image vectors.
      Best values for each category are represented in bold.
      Table 4Training result evaluation of various architectures on Dataset 2019.
      DesignAccuracySensitivitySpecificity
      Class 1Class 2Class 3OverallClass 1Class 2Class 3OverallClass 1Class 2Class 3Overall
      Single pathFL0.8790.9110.8810.8900.8600.9090.8420.8700.8810.9410.8610.894
      T1-CE0.8360.8300.8290.8320.8410.8350.8320.8360.8460.8310.8330.837
      T20.8190.8220.8010.8140.8270.8190.8110.8190.8240.8100.8330.822
      T10.7380.7000.7140.7170.7400.7290.7310.7330.7290.7160.7210.722
      Dual pathFL+T1-CE0.9310.8970.9110.9130.8910.9090.8990.9000.9120.9110.9020.908
      FL+T10.9220.8890.9110.9070.8990.8880.8930.8930.9110.9130.8810.902
      T1+T20.8710.8700.8800.8740.8990.8710.8690.8800.8670.8890.8710.876
      FL+T20.8710.8610.8780.8700.8780.8680.8620.8690.8760.8690.8800.875
      T1-CE+T20.8650.8390.8440.8490.8530.8650.8320.8500.8300.8600.8570.849
      T1+T1-CE0.8440.8210.8220.8290.8390.8290.8380.8350.8430.8290.8100.827
      Three pathFL+T1-CE+T20.9660.9410.9530.9530.9550.9420.9590.9520.9230.9010.9210.915
      T1+T1-CE+T20.9520.9320.9490.9440.9360.9210.9220.9260.9410.9230.9230.929
      FL+T1+T20.9390.9340.9200.9310.9380.9440.9390.9400.9260.9180.9110.918
      FL+T1-CE+T10.9130.9160.9000.9100.9230.9290.9220.9250.9290.9010.9030.911
      Four pathT1+T2+T1-CE+FL0.9550.9270.9510.9440.9420.9190.9030.9210.9300.9370.9370.935
      Modality concatenatedFL+T2+T1-CE0.9330.9110.9120.9180.9620.9090.8610.9100.9080.8990.9180.908
      FL+T2+T1-CE+T10.9460.9230.9300.9330.9290.9120.9030.9140.9090.9110.8870.932
      Combined network0.9880.9790.9880.9780.9990.9530.9560.9690.9880.9550.9880.977
      Combined network with image vector (Proposed)1.0001.0001.0001.0000.9990.9980.9990.9990.9990.9980.9970.999
      5-fold cross validated results are represented.
      Combined network = Three path + modality concatenated (FL + T2 + T1-CE + T1).
      Proposed = Combined network + image vectors.
      Best values for each category are represented in bold.
      Table 5Testing result evaluation of various architectures on Dataset 2019.
      DesignAccuracySensitivitySpecificity
      Class 1Class 2Class 3OverallClass 1Class 2Class 3OverallClass 1Class 2Class 3Overall
      Single pathFL0.8610.8890.8710.8660.8500.8920.8550.8770.8710.9270.8520.883
      T1-CE0.8300.8210.8280.8330.8440.8250.8310.8260.8220.8330.8310.829
      T20.7990.8110.7910.8200.8210.8200.8190.8000.8130.8000.8290.814
      T10.7210.7050.7010.7260.7270.7190.7320.7090.7190.7220.7210.721
      Dual pathFL+T1-CE0.9260.8770.8840.8950.8610.8890.8930.8860.9020.9100.8900.901
      FL+T10.8910.8760.9010.8890.8960.8710.8830.8970.9010.9040.8910.899
      T1+T20.8660.8710.8700.8670.8810.8720.8490.8690.8610.8720.8660.866
      FL+T20.8550.8600.8710.8700.8700.8620.8770.8620.8710.8610.8660.866
      T1-CE+T20.8620.8310.8320.8420.8440.8610.8220.8420.8300.8550.8510.845
      T1+T1-CE0.8310.8110.8210.8270.8190.8300.8310.8210.8330.8110.8160.820
      Three pathFL+T1-CE+T20.9550.9390.9560.9450.9310.9320.9510.9440.9110.8990.9150.908
      T1+T1-CE+T20.9510.9110.9290.9210.9220.9200.9220.9300.9280.9220.9100.920
      FL+T1+T20.9290.9310.9200.9300.9280.9410.9210.9270.9110.9110.9000.907
      FL+T1-CE+T10.8950.9060.9010.9090.9050.9210.9000.9010.9210.8880.8890.899
      Four pathT1+T2+T1-CE+FL0.9440.9130.9300.9290.9410.9120.8990.9170.9300.9310.9330.931
      Modality concatenatedFL+T2+T1-CE0.9270.9100.9080.9150.9580.8990.8490.9020.9010.8990.9110.903
      FL+T2+T1-CE+T10.9390.9210.9260.9280.9240.9000.9010.9080.9000.8990.9210.916
      Combined network0.9510.9690.9890.9700.9810.9770.9260.9670.9510.9440.9870.964
      Combined network with image vector (Proposed)0.9980.9970.9970.9970.9990.9810.9990.9920.9990.9980.9970.999
      5-fold cross validated results are represented.
      Combined network = Three path + modality concatenated (FL + T2 + T1-CE + T1).
      Proposed = Combined network + image vectors.
      Best values for each category are represented in bold.
      The four main trends are observed from the above ablation study. Firstly, the three-pathway network outperforms in comparison to the modality-concatenated network. It is because three different modalities are processed independently in the three-path system, resulting in better extraction of modality-specific attributes. On the other hand, the modality-concatenated pathway concatenates four modalities at the beginning, and further processing occurs via a single convolutional path leading to suboptimal classification results. Secondly, employing the T1 modality as a separate path in four path system does not significantly improve the metrics. However, concatenating T1 with the other three modalities (in the modality-concatenated pathway) has performed better than the FLAIR, T1-CE, and T2 concatenation. This typically reflects the importance of employing the T1 modality in cases lacking image contrast between tumor regions.
      Thirdly, integrating modality-specific and modality-concatenated systems into a unified framework will help achieve better classification results. The modality-specific system extracts high predictive patterns from three sub-constituents of HGG by processing each modality individually. At the same time, the modality-concatenated framework preserves intra-modality correlations while performing convolutional operations on multiple 2D channels. Lastly, the direct fusion of image vectors with outputs of respective modality-specific paths further refines the classification output by accumulating low-level feature representations.
      The classification results (accuracy, sensitivity, and specificity) obtained using six different network configurations (single path, dual path, three paths, modality concatenated, combined network, and combined network with image vector) for overall and three survival classes are illustrated in Fig. 5, Fig. 6 using radar plots. A radar plot is a 2D graphical representation of multivariate data with a random number of variables. Radar plots are also commonly known as spider plots or web plots. These plots are typically comprised of equiangular spokes called radii. Here each radius signifies a network configuration, and the magnitude axis represents the accuracy, sensitivity, and specificity values, respectively.
      Figure thumbnail gr5
      Fig. 5Radar plot representations of different model designs and classification metrics for training versus testing for BraTS 2018. (a1, a2, a3, a4): Radar graphical representations of accuracy using six network configurations for class1, class2, class3 and overall respectively, (b1, b2, b3, b4): Radar graphical representations of sensitivity using 6 network configurations for class1, class2, class3 and overall respectively, (c1, c2, c3, c4): Radar graphical representations of specificity using 6 network configurations for class1, class2, class3 and overall respectively. The magnitude of classification metrics is represented on the magnitude axis from 0.8 to 1. The lesser the gap between two colored lines on these plots, the lower be the gap between the training and the testing results. Only the best results in each network configuration are represented here.
      Figure thumbnail gr6
      Fig. 6Radar plot representations of different model designs and classification metrics for training versus testing for BraTS 2019. (a1, a2, a3, a4): Radar graphical representations of accuracy using six network configurations for class1, class2, class3 and overall respectively, (b1, b2, b3, b4): Radar graphical representations of sensitivity using 6 network configurations for class1, class2, class3 and overall respectively, (c1, c2, c3, c4): Radar graphical representations of specificity using 6 network configurations for class1, class2, class3 and overall respectively. The magnitude of classification metrics is represented on the magnitude axis from 0.8 to 1. The lesser the gap between two colored lines on these plots, the lower be the gap between the training and the testing results. Only the best results in each network configuration are represented here.
      Fig. 5 represents the radar plots for the BraTS 2018 database. The first row (a1- a4) depicts the respective classes’ and overall accuracy comparison (training versus testing) of different network architectures. In this, the training and testing accuracies are highlighted in the red and green colored areas of curvature, respectively. The more the red-colored curvature area, the more the gap between training and testing of that particular configuration. In Fig. 5 (a1), the modality-concatenated configuration portrays the highest gap between training and testing results, which is undesirable in machine learning algorithms. Similarly, in Fig. 5 (b1-b4) and (c1-c4), we have visualized the sensitivity and the specificity radar plots. In these, the more the blue or purple colored area of curvature, the more the gap between training and testing. For class 1 (b1), the training and testing sensitivities almost overlap for the combined and three-path network, whereas dual-path highlights the widest gap between these two. Similarly, for c4, the overall specificity of the dual path is the least, whereas the proposed model depicts the best specificity outcomes. The corresponding radar plots for the BraTS 2019 dataset are highlighted in Fig. 6.

      4.3.2 Attention of classification model over convolutional layers

      For a better perception of the information flow between the different layers of the network, it is important to incorporate the visualization of heatmaps for each classification output. In this section, the gradient-based class activation maps estimate the important regions in the input MRI. The gradient-based approach typically color code the regions that have maximum influence in predicting a particular survival class. The last convolutional layer of the modality-concatenated pathway is considered to represent these saliency maps. Then the saliency maps are generated in accordance with these four modalities. The results of the internal visualizations are depicted in Fig. 7, Fig. 8.
      Figure thumbnail gr7
      Fig. 7Visualization of the gradient based class activation maps for different convolutional layers in the classification model. These saliency maps depicts the process flow of spatial information between internal convolutional layers. Initially, the model focuses on the entire brain region and subsequently, the attention is moved toward the different tumor sub-constituents for the next layers. The color Blue signifies low attention and Red corresponds to high attention. The red bounding box represents tumor region. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      Figure thumbnail gr8
      Fig. 8Visualization of the gradient based class activation maps for the healthy images. These heatmaps are obtained for the convolutional layer 1, 2 and 3 respectively. The color Blue signifies low attention and Red corresponds to high attention. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      For the representation of the gradient-based activation maps in Fig. 7, the output feature maps of convolutional layers 1, 2, and 3 are extracted, and their gradients are estimated as discussed in Algorithm 1. For producing the survival class of each input MRI, the network acquires feature information from the three HGG sub-regions: peritumoral edema, enhancing tumor, and necrotic regions. These tumor parts possess intra-tumoral characteristics, which are directly correlated to the tumor progression and its survival prediction [
      • Chaddad A.
      • Daniel P.
      • Desrosiers C.
      • Toews M.
      • Abdulkarim B.
      Novel Radiomic Features Based on Joint Intensity Matrices for Predicting Glioblastoma Patient Survival Time.
      ]. Hence, the visualization maps obtained are in-line with the researchers’ insight as these maps also capture the tumor-related features for survival class estimation.
      Furthermore, in Fig. 7, initially, the model concentrates on the entire brain region while giving less emphasis to the tumor region. But later, the network’s attention converged on the tumor region, thereby obtaining its essential attributes. Like in the saliency map of convolutional layer 2 (input MRI 1 and input MRI 2), the model tries to capture the details of the tumor part. But the model cannot precisely locate the tumor region at this position. But subsequently, in the 3rd conv layer, the model can pin-point the tumor regions for the automatic recognition of some prognostic features. Input MRI 3 represents a hard case where the intensity distributions of the tumor region are somewhat similar to the healthy region intensities in all four modalities. In this case, it is tough for the model to target the tumor part, and initially, the model gives high attention to the parts with higher differences in intensity levels. But at the 3rd convolutional layer, the model can locate the tumor part but also provide some attention to the healthy regions.
      These experiments highlight the network's overall strategy in localizing the tumor regions responsible for evaluating the overall survival class. Also, the network utilizes a top-down strategy in the localization of the tumor region, which is consistent with studies of brain tumor segmentation networks [
      • Natekar P.
      • Kori A.
      • Krishnamurthi G.
      Demystifying Brain Tumor Segmentation Networks: Interpretability and Uncertainty Analysis.
      ] and its visualizations.
      Although for the classification system, only tumorous images are considered, but we also tried to input a few of the healthy images to the model and tried to observe its saliency maps visualizations in Fig. 8. As perceived from Fig. 8, the model is somewhat trying to get the details of some regions of the brain but without providing much attention to any of the specific regions. This shows the model's behavior at different convolutional layers if a healthy brain image is fed.

      4.3.3 Attention of classification model over non-predicted labels

      To verify the information flow of correctly identified predicted classes, saliency maps of non-predicted classes are also obtained. The saliency maps of a particular class are acquired by selecting the output value of that particular class and then calculating its gradient about the selected convolutional layer output. The results are illustrated in Fig. 9. The first input MRI belongs to survival class 2. For class 2 saliency maps, the network’s attention is given to the HGG tumor part, which further encodes the survival information. For class 0 saliency maps, as highlighted in Fig. 9, the model does not focus on the tumor region. Instead, it learns the features of some random brain regions. Also, for generating the class 2 saliency maps, the model shifts its importance to a small distinct brain region other than focusing on the tumor part. A similar behavior is observed in the successive two MRI inputs as well, where also the unimportant regions of the brain MRI are highlighted during the learning process. Hence this validates that the model is learning some important features of the tumor regions when we are obtaining the attention maps of the top predicted class. But for other predicted classes, the model shifts its attention to some non-specific regions of the brain.
      Figure thumbnail gr9
      Fig. 9Visualization of the gradient-based class activation maps for different predicted classes. It shows that when both actual label and predicted label are same, the classification model learns the important characteristics of tumor parts. In case of actual and predicted label mismatch, the information learned by the model is more focused on other unimportant brain regions. The color Blue signifies low attention and Red corresponds to high attention. The red bounding box represents tumor region. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

      4.3.4 Quantitative comparison with state-of-the-art methods

      Since the evaluation is carried out on BraTS 2018 and BraTS 2019 datasets, the work is compared with both BraTS 2018 and BraTS 2019 challenges top-ranked architectures. The detailed comparison of both datasets is provided as follows:

      4.3.4.1 Comparison to the State-of-the-Arts (BraTS 2018)

      Table 6 depicts the comparative evaluation of the state-of-the-art methods for survival prediction of the BraTS 2018 dataset. Puybareau et al. [
      • Puybareau E.
      • Tochon G.
      • Chazalon J.
      • Fabrizio J.
      Segmentation of gliomas and prediction of patient overall survival: a simple and fast procedure.
      ] utilized a feature-based technique for predicting the survival group of HGG patients. This feature-based method typically involves extraction of tumor size and localization features from stacked 2D slices of MRI. The extracted sizes of three tumor regions, namely edema, necrosis and active tumor along with age information are fed to the random forests classifier and achieves an overall accuracy of 61%. Our method is superior to this one by around 54%. Sun et al. [
      • Sun L.
      • Zhang S.
      • Chen H.
      • Luo L.
      Brain tumor segmentation and survival prediction using multimodal MRI scans with deep learning.
      ] got 2nd position in the BraTS challenge for the survival prediction task as the authors took advantage of extracting first-order, morphological and textural features from MRI images manually. A random forest regressor is applied to the selected set of features. Although this method utilized a lot of feature extraction techniques for 2D MRI, the accuracies produced were quite low.
      Table 6Comparison of the recent classification methods with the proposed model for BraTS 2018 and BraTS 2019 datasets.
      ModalitiesMethodsApproachTypeAccuracy
      T2 + FL + CEPuybareau et al.
      • Puybareau E.
      • Tochon G.
      • Chazalon J.
      • Fabrizio J.
      Segmentation of gliomas and prediction of patient overall survival: a simple and fast procedure.
      *
      Feature-based2D0.61
      T1 + T2 + FL + CESun et al.
      • Sun L.
      • Zhang S.
      • Chen H.
      • Luo L.
      Brain tumor segmentation and survival prediction using multimodal MRI scans with deep learning.
      *
      Feature-based2D0.61
      T1 + FL + CECabez as et al.

      Cabezas M, et al. Survival prediction using ensemble tumor segmentation and transfer learning. arXiv Prepr. arXiv1810.04274, 2018.

      *
      Pretrained VGG + clinical + volume features.3D0.37
      T1 + T2 + FL + CEFeng et al.
      • Feng X.
      • Tustison N.J.
      • Patel S.H.
      • Meyer C.H.
      Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features.
      *
      Feature-based3D0.61
      T1 + T2 + FL + CEZhou et al.
      • Zhou T.
      Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients.
      *
      Multi-path CNN2D0.67
      T1 + T2 + FL + CEHuang et al.
      • Huang H.
      • Zhang W.
      • Fang Y.
      • Hong J.
      • Su S.
      • Lai X.
      Overall Survival Prediction for Gliomas Using a Novel Compound Approach.
      #
      CNN + Feature-based3D0.69
      T2 + CEGuo et al.
      • Guo X.
      • Yang C.
      • Lam P.L.
      • Woo P.Y.M.
      • Yuan Y.
      Domain knowledge based brain tumor segmentation and overall survival prediction.
      #
      CNN3D0.59
      T1 + FL + CEPie et al.
      • Pei L.
      • Vidyaratne L.
      • Rahman M.M.
      • Iftekharuddin K.M.
      Context aware deep learning for brain tumor segmentation, subtype classification, and survival prediction using radiology images.
      #
      CNN3D0.59
      T1 + T2 + FL + CEAmian et al.
      • Amian M.
      • Soltaninejad M.
      Multi-resolution 3D CNN for MRI brain tumor segmentation and survival prediction.
      #
      Feature-based3D0.52
      T1 + T2 + FL + CEYogananda et al.
      • Yogananda C.G.B.
      • et al.
      Fully automated brain tumor segmentation and survival prediction of gliomas using deep learning and MRI.
      #
      Feature-based3D0.44
      T1 + T2 + FL + CEProposed*Integrated CNN2D0.99
      T1 + T2 + FL + CEProposed#Integrated CNN2D1.00
      *Methods for BraTS 2018.
      #Methods for BraTS 2019.
      T1 + T2 + FL + CE corresponds to T1, T2, FLAIR, T1-CE.
      Further, Cabezas et al. [

      Cabezas M, et al. Survival prediction using ensemble tumor segmentation and transfer learning. arXiv Prepr. arXiv1810.04274, 2018.

      ] employed feature-based and feature-learned strategy to predict HGG survival class. Firstly, the authors use a pre-trained CNN for automatic feature generation. Also, volume-related features and age information are concatenated at the final layer of the model to predict survival class. This method performed poorly on the test data while achieving only 37% accuracy rate. Both Baid et al. [
      • Baid U.
      • Rane S.U.
      • Talbar S.
      • Gupta S.
      • Thakur M.H.
      • Moiyadi A.
      • et al.
      Overall Survival Prediction in Glioblastoma With Radiomic Features Using Machine Learning.
      ] and Feng et al. [
      • Feng X.
      • Tustison N.J.
      • Patel S.H.
      • Meyer C.H.
      Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features.
      ] are based on the radiomic feature extraction methods for exploiting imaging-based features from MRI volumes. A set of wavelet decomposition, intensity-based, shape, and textural features are generated, which are then assessed for imparting survival class of HGG patients. The accuracies for both of these methods could only reach 61% which is significantly low in comparison to the proposed method.
      Fu et al. [
      • Fu X.
      • Chen C.
      • Li D.
      Survival prediction of patients suffering from glioblastoma based on two-branch DenseNet using multi-channel features.
      ] employed a complete deep-learning based approach to determine the classification results. The authors used a dual-path CNN, comprised of dense blocks applied in series. Each branch uses T2 and T1-CE modalities, respectively for the automatic generation of multi-channel survival features. The results obtained were comparatively better, but it is worth mentioning that our method achieved a significant gain of about 6.06% in overall classification accuracy.

      4.3.4.2 Comparison to the State-of-the-Arts (BraTS 2019)

      Table 6 also depicts the comparison evaluation of the state-of-the art methods for survival prediction of BraTS 2019 dataset. Huang et al. [
      • Huang H.
      • Zhang W.
      • Fang Y.
      • Hong J.
      • Su S.
      • Lai X.
      Overall Survival Prediction for Gliomas Using a Novel Compound Approach.
      ] presented a hybrid approach for predicting overall survival of HGG patients. The authors extracted first-order, wavelet-based and textural features from the tumor area and also used a single path CNN for extracting deep survival features. The dimensionality of the feature vector is then reduced, and a selected set of features is input to the random forest for predicting survival times. Though the authors utilized a 3D CNN for deep survival feature extraction, the results were not promising.
      Guo et al. [
      • Guo X.
      • Yang C.
      • Lam P.L.
      • Woo P.Y.M.
      • Yuan Y.
      Domain knowledge based brain tumor segmentation and overall survival prediction.
      ] employed a 3D CNN with dense blocks and a transition convolutional layer to encode the survival classification characteristics. Since the tumor location is of utmost value in predicting survival times of HGG patients, therefore the authors inculcated the tumor location information in the 3D CNN but still, the overall classification accuracy could not reach the potential of a 3D CNN. Pie et al. [
      • Pei L.
      • Vidyaratne L.
      • Rahman M.M.
      • Iftekharuddin K.M.
      Context aware deep learning for brain tumor segmentation, subtype classification, and survival prediction using radiology images.
      ] proposed a unified model for segmentation, classification and predicting survival class of HGG patients for the BraTS 2019 dataset. In this, a single 3D CNN model is able to achieve three defined tasks with good segmentation accuracies. For the survival classification task, the extracted deep features are fed to a linear regressor model which predicts the survival times. The only novel contribution of this method was the utilization of an integrated model for three defined tasks and this method could not potentially hike the classification accuracies of the survival prediction tasks.
      Amian et al. [
      • Amian M.
      • Soltaninejad M.
      Multi-resolution 3D CNN for MRI brain tumor segmentation and survival prediction.
      ] and Yogananda et al. [
      • Yogananda C.G.B.
      • et al.
      Fully automated brain tumor segmentation and survival prediction of gliomas using deep learning and MRI.
      ] deployed a feature-based strategy for evaluating the overall survival class. Statistical and volume-related features were extracted using the pyradiomics package of python, followed by the utilization of a random forest classifier. Both of these methods underperformed in terms of classification accuracies.
      It is observed from the above-mentioned state-of-the-art techniques that most of the survival prediction methodologies are 3D in nature. Also, none of these feature-learned methods have exploited the impact of single-path and multi-path CNN into a unified framework. Moreover, the results obtained by most classification approaches are not acceptable in the medical domain because of the extremely low classification accuracy rate. Whereas our system backed up with a visual interpretability framework can attain the best classification outcomes while surpassing all recent methodologies.

      5. Conclusions

      This work aims to improve the existing classification models for the overall survival prediction of brain tumor patients along with its in-depth visual interpretability analysis. So, a unified deep-learning based framework is proposed for the overall survival classification of HGG into three survivor groups, namely, short, mid, and long survivors, with its gradient-based visual interpretation. In this work, the overall survival classification pipeline is integrated with modality-specific and modality-concatenated networks for enhancing the model’s classification accuracies. The modality-specific pathway simply concatenates features from the multiple CNN pathways, thereby elevating the modality-specific attributes which are essential for survival prediction. Also, in this pathway, the image vectors of three respective modalities are fused with the higher-level features to overcome the discrepancy between low-level and high-level feature representations. And for maintaining the correlations among different MRI modalities, a modality-concatenated pathway is also utilized. Furthermore, only generating heatmaps for the classification results would not provide complete insight into the classification tasks. So, how the attention varies over different convolutional layers and non-predicted labels is also analysed in this work using a gradient-based approach. For both datasets, the experiments have demonstrated the highest values of evaluation criteria, compared to their counterparts. Overall, the complete system defines an accurate and robust model for the overall survival classification of HGG patients, which would be of great clinical value and could benefit individualized treatment planning. The future work includes the identification of features from the HGG sub-regions with prognostic significance. This can be accomplished by extracting imaging-based features from tumor regions and performing its survival analysis.

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      References

        • Inda M.
        • Bonavia R.
        • Seoane J.
        Glioblastoma multiforme: A look inside its heterogeneous nature.
        Cancers. 2014; 6: 226-239https://doi.org/10.3390/cancers6010226
        • Chaddad A.
        • Daniel P.
        • Desrosiers C.
        • Toews M.
        • Abdulkarim B.
        Novel Radiomic Features Based on Joint Intensity Matrices for Predicting Glioblastoma Patient Survival Time.
        IEEE J Biomed Heal Informatics. 2019; 23: 795-804https://doi.org/10.1109/JBHI.2018.2825027
        • Naser M.A.
        • Deen M.J.
        Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images.
        Comput Biol Med. 2020; 121103758https://doi.org/10.1016/j.compbiomed.2020.103758
        • Barbieri M.
        • Brizi L.
        • Giampieri E.
        • Solera F.
        • Manners D.N.
        • Castellani G.
        • et al.
        A deep learning approach for magnetic resonance fingerprinting: Scaling capabilities and good training practices investigated by simulations.
        Phys Medica. 2021; 89: 80-92
        • Zegers C.M.L.
        • Posch J.
        • Traverso A.
        • Eekers D.
        • Postma A.A.
        • Backes W.
        • et al.
        Current applications of deep-learning in neuro-oncological MRI.
        Phys Medica. 2021; 83: 161-173
        • Aurna N.F.
        • Yousuf M.A.
        • Taher K.A.
        • Azad A.K.M.
        • Moni M.A.
        A classification of MRI brain tumor based on two stage feature level ensemble of deep CNN models.
        Comput Biol Med. 2022; 146105539https://doi.org/10.1016/j.compbiomed.2022.105539
        • Manco L.
        • Maffei N.
        • Strolin S.
        • Vichi S.
        • Bottazzi L.
        • Strigari L.
        Basic of machine learning and deep learning in imaging for medical physicists.
        Phys Medica. 2021; 83: 194-205https://doi.org/10.1016/j.ejmp.2021.03.026
        • Castiglioni I.
        • Rundo L.
        • Codari M.
        • Di Leo G.
        • Salvatore C.
        • Interlenghi M.
        • et al.
        AI applications to medical images: From machine learning to deep learning.
        Phys Medica. 2021; 83: 9-24
        • Papadimitroulas P.
        • Brocki L.
        • Christopher Chung N.
        • Marchadour W.
        • Vermet F.
        • Gaubert L.
        • et al.
        Artificial intelligence: Deep learning in oncological radiomics and challenges of interpretability and data harmonization.
        Phys Medica. 2021; 83: 108-121
        • Zeineldin R.A.
        • et al.
        Explainability of deep neural networks for MRI analysis of brain tumors.
        Int J Comput Assist Radiol Surg. 2022; 1–11: 2022https://doi.org/10.1007/s11548-022-02619-x
        • Singh A.
        • Sengupta S.
        • Lakshminarayanan V.
        Explainable deep learning models in medical image analysis.
        Journal of Imaging. 2020; 6: 1-19https://doi.org/10.3390/JIMAGING6060052
      1. Banerjee S, Mitra S, Shankar BU. Automated 3D segmentation of brain tumor using visual saliency. Inf. Sci. (Ny).2018;424:337–353. 10.1016/j.ins.2017.10.011.

        • Puybareau E.
        • Tochon G.
        • Chazalon J.
        • Fabrizio J.
        Segmentation of gliomas and prediction of patient overall survival: a simple and fast procedure.
        In International MICCAI Brainlesion Workshop. 2018; : 199-209
        • Huang H.
        • Zhang W.
        • Fang Y.
        • Hong J.
        • Su S.
        • Lai X.
        Overall Survival Prediction for Gliomas Using a Novel Compound Approach.
        Front Oncol. 2021; 11: 1-20https://doi.org/10.3389/fonc.2021.724191
        • Reyes M.
        • Meier R.
        • Pereira S.
        • Silva C.A.
        • Dahlweid F.-M.
        • Tengg-Kobligk H.V.
        • et al.
        On the interpretability of artificial intelligence in radiology: Challenges and opportunities. Radiology.
        Artif Intell. 2020; 2: e190043
        • Saleem H.
        • Shahid A.R.
        • Raza B.
        Visual interpretability in 3D brain tumor segmentation network.
        Comput Biol Med. 2021; 133: 1-11https://doi.org/10.1016/j.compbiomed.2021.104410
        • Bhadani S.
        • Mitra S.
        • Banerjee S.
        Fuzzy volumetric delineation of brain tumor and survival prediction.
        Soft Comput. 2020; 24: 13115-13134https://doi.org/10.1007/s00500-020-04728-8
      2. Lao J, et al. A Deep Learning-Based Radiomics Model for Prediction of Survival in Glioblastoma Multiforme. Sci. Rep.;7:1–8. 10.1038/s41598-017-10649-8.

        • Pei L.
        • Vidyaratne L.
        • Rahman M.M.
        • Iftekharuddin K.M.
        Context aware deep learning for brain tumor segmentation, subtype classification, and survival prediction using radiology images.
        Sci Rep. 2020; 10: 1-11https://doi.org/10.1038/s41598-020-74419-9
        • Banerjee S.
        • Mitra S.
        • Shankar B.U.
        Multi-planar spatial-ConvNet for segmentation and survival prediction in brain cancer.
        In International MICCAI Brainlesion Workshop. 2018; : 94-104
        • Nie D.
        • Lu J.
        • Zhang H.
        • Adeli E.
        • Wang J.
        • Yu Z.
        • et al.
        Multi-Channel 3D Deep Feature Learning for Survival Time Prediction of Brain Tumor Patients Using Multi-Modal Neuroimages.
        Sci Rep. 2019; 9https://doi.org/10.1038/s41598-018-37387-9
        • Fu X.
        • Chen C.
        • Li D.
        Survival prediction of patients suffering from glioblastoma based on two-branch DenseNet using multi-channel features.
        Int J Comput Assist Radiol Surg. 2021; 16: 207-217https://doi.org/10.1007/s11548-021-02313-4
        • Kao P.Y.
        • Ngo T.
        • Zhang A.
        • Chen J.W.
        • Manjunath B.S.
        Brain tumor segmentation and tractographic feature extraction from structural MR images for overall survival prediction.
        In International MICCAI Brainlesion Workshop. 2018; : 128-141
        • Mossa A.A.
        • Çevik U.
        Ensemble learning of multiview CNN models for survival time prediction of brain tumor patients using multimodal MRI scans.
        Turkish J Electr Eng Comput Sci. 2021; 29: 616-631https://doi.org/10.3906/ELK-2002-175
      3. BraTS 2018 Proceedings. https://www.cbica.upenn.edu/sbia/Spyridon.Bakas/MICCAI_BraTS/MICCAI_BraTS_2018_proceedings_shortPapers.pdf.

        • Yang D.
        • Rao G.
        • Martinez J.
        • Veeraraghavan A.
        • Rao A.
        Evaluation of tumor-derived MRI-texture features for discrimination of molecular subtypes and prediction of 12-month survival status in glioblastoma.
        Med Phys. 2015; 42: 6725-6735https://doi.org/10.1118/1.4934373
      4. Tang W, Zhang H, Yu P, Kang H, Zhang R. MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients. arXiv Prepr. arXiv2206.06267, 2022.

        • Malhotra R.
        • Saini B.S.
        • Gupta S.
        A novel compound-based loss function for glioma segmentation with deep learning.
        Optik. 2022; 265169443https://doi.org/10.1016/j.ijleo.2022.169443
      5. https://en.wikipedia.org/wiki/Sensitivity_and_specificity.

      6. https://en.wikipedia.org/wiki/Accuracy_and_precision.

        • Natekar P.
        • Kori A.
        • Krishnamurthi G.
        Demystifying Brain Tumor Segmentation Networks: Interpretability and Uncertainty Analysis.
        Front Comput Neurosci. 2020; 14: 1-12https://doi.org/10.3389/fncom.2020.00006
        • Sun L.
        • Zhang S.
        • Chen H.
        • Luo L.
        Brain tumor segmentation and survival prediction using multimodal MRI scans with deep learning.
        Front Neurosci. 2019; 13: 1-9
      7. Cabezas M, et al. Survival prediction using ensemble tumor segmentation and transfer learning. arXiv Prepr. arXiv1810.04274, 2018.

        • Baid U.
        • Rane S.U.
        • Talbar S.
        • Gupta S.
        • Thakur M.H.
        • Moiyadi A.
        • et al.
        Overall Survival Prediction in Glioblastoma With Radiomic Features Using Machine Learning.
        Front Comput Neurosci. 2020; 14https://doi.org/10.3389/fncom.2020.00061
        • Feng X.
        • Tustison N.J.
        • Patel S.H.
        • Meyer C.H.
        Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features.
        Front Comput Neurosci. 2020; 14: 1-12https://doi.org/10.3389/fncom.2020.00025
        • Guo X.
        • Yang C.
        • Lam P.L.
        • Woo P.Y.M.
        • Yuan Y.
        Domain knowledge based brain tumor segmentation and overall survival prediction.
        in: International MICCAI Brainlesion Workshop. 2019: 285-295
        • Amian M.
        • Soltaninejad M.
        Multi-resolution 3D CNN for MRI brain tumor segmentation and survival prediction.
        in: International MICCAI Brainlesion Workshop. 2019: 221-230
        • Yogananda C.G.B.
        • et al.
        Fully automated brain tumor segmentation and survival prediction of gliomas using deep learning and MRI.
        in: International MICCAI Brainlesion Workshop. 2019: 99-112
        • Zhou T.
        Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients.
        in: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2020: 221-231