Advertisement

Deep learning dose prediction for IMRT of esophageal cancer: The effect of data quality and quantity on model performance

Published:March 10, 2021DOI:https://doi.org/10.1016/j.ejmp.2021.02.026

      Abstract

      Purpose

      To investigate the effect of data quality and quantity on the performance of deep learning (DL) models, for dose prediction of intensity-modulated radiotherapy (IMRT) of esophageal cancer.

      Material and methods

      Two databases were used: a variable database (VarDB) with 56 clinical cases extracted retrospectively, including user-dependent variability in delineation and planning, different machines and beam configurations; and a homogenized database (HomDB), created to reduce this variability by re-contouring and re-planning all patients with a fixed class-solution protocol.
      Experiment 1 analysed the user-dependent variability, using 26 patients planned with the same machine and beam setup (E26-VarDB versus E26-HomDB). Experiment 2 increased the training set by groups of 10 patients (E16, E26, E36, E46, and E56) for both databases.
      Model evaluation metrics were the mean absolute error (MAE) for selected dose-volume metrics and the global MAE for all body voxels.

      Results

      For Experiment 1, E26-HomDB reduced the MAE for the considered dose-volume metrics compared to E26-VarDB (e.g. reduction of 0.2 Gy for D95-PTV, 1.2 Gy for Dmean-heart or 3.3% for V5-lungs). For Experiment 2, increasing the database size slightly improved performance for HomDB models (e.g. decrease in global MAE of 0.13 Gy for E56-HomDB versus E26-HomDB), but increased the error for the VarDB models (e.g. increase in global MAE of 0.20 Gy for E56-VarDB versus E26-VarDB).

      Conclusion

      A small database may suffice to obtain good DL prediction performance, provided that homogenous training data is used. Data variability reduces the performance of DL models, which is further pronounced when increasing the training set.

      Keywords

      1. Introduction

      Despite the advances in radiotherapy, treatment planning is still one of the bottlenecks in clinical practice. In order to achieve the optimal radiation dose distribution for a certain patient, multiple interactions between the medical physicist or dosimetrist and the physician are needed, accompanied by an iterative manual tuning of the dose-volume objectives. This results in several hours of trial-and-error manual work. The variability in clinical expertise, together with the time limit imposed by tight clinical schedules, can sometimes lead to suboptimal radiotherapy plans that can comprise the treatment outcome [
      • Moore K.L.
      • Schmidt R.
      • Moiseenko V.
      • Olsen L.A.
      • Tan J.
      • Xiao Y.
      • et al.
      Quantifying unnecessary normal tissue complication risks due to suboptimal planning: A Secondary Study of RTOG 0126.
      ,
      • Marcello M.
      • Ebert M.
      • Haworth A.
      • Steigler A.
      • Kennedy A.
      • Joseph D.
      • et al.
      Association between treatment planning and delivery factors and disease progression in prostate cancer radiotherapy: Results from the TROG 03.04 RADAR trial.
      ]. Automating and standardizing the full treatment planning process is, therefore, the key to improve clinical practice and ensure the best possible treatment for each patient.
      Several strategies have been proposed to automate this manual process, the most popular ones being multi-criteria optimization (MCO) [
      • Craft D.
      • McQuaid D.
      • Wala J.
      • Chen W.
      • Salari E.
      • Bortfeld T.
      Multicriteria VMAT optimization.
      ], wish-list iterative planning [
      • Breedveld S.
      • Storchi P.R.M.
      • Voet P.W.J.
      • Heijmen B.J.M.
      iCycle: Integrated, multicriterial beam angle, and profile optimization for generation of coplanar and noncoplanar IMRT plans.
      ], and knowledge-based planning (KBP) [
      • Ge Y.
      • Wu Q.J.
      Knowledge-based planning for intensity-modulated radiation therapy: A review of data-driven approaches.
      ]. The latter aims at extracting information and/or learning from previous clinical plans, so that the knowledge contained in clinical databases is used to generate the current treatment plan. The first KBP studies [
      • Ge Y.
      • Wu Q.J.
      Knowledge-based planning for intensity-modulated radiation therapy: A review of data-driven approaches.
      ,
      • Hussein M.
      • Heijmen B.J.M.
      • Verellen D.
      • Nisbet A.
      Automation in intensity modulated radiotherapy treatment planning – A review of recent innovations.
      ] focused on predicting dose-volume histogram (DVH) metrics for the new patient, by extrapolating information extracted from similar patients contained in a database of prior clinical plans [
      • Zarepisheh M.
      • Long T.
      • Li N.
      • Tian Z.
      • Romeijn H.E.
      • Jia X.
      • et al.
      A DVH-guided IMRT optimization algorithm for automatic treatment planning and adaptive radiotherapy replanning.
      ,
      • Fogliata A.
      • Nicolini G.
      • Bourgier C.
      • Clivio A.
      • De Rose F.
      • Fenoglietto P.
      • et al.
      Performance of a knowledge-based model for optimization of volumetric modulated Arc therapy plans for single and bilateral breast irradiation.
      ,
      • Fogliata A.
      • Reggiori G.
      • Stravato A.
      • Lobefalo F.
      • Franzese C.
      • Franceschini D.
      • et al.
      RapidPlan head and neck model: The objectives and possible clinical benefit.
      ,
      • Tol J.P.
      • Delaney A.R.
      • Dahele M.
      • Slotman B.J.
      • Verbakel W.F.A.R.
      Evaluation of a knowledge-based planning solution for head and neck cancer.
      ,
      • Wu B.
      • Ricchetti F.
      • Sanguineti G.
      • Kazhdan M.
      • Simari P.
      • Jacques R.
      • et al.
      Data-driven approach to generating achievable dose-volume histogram objectives in intensity-modulated radiotherapy planning.
      ,
      • Valdes G.
      • Simone 2nd, C.B.
      • Chen J.
      • Lin A.
      • Yom S.S.
      • Pattison A.J.
      • et al.
      Clinical decision support of radiotherapy treatment planning: A data-driven machine learning strategy for patient-specific dosimetric decision making.
      ]. The similarity between the new patient and those in the database was typically measured using pre-selected (i.e. hand-crafted) features, such as the distance from organs to target. The predicted DVH metrics were later used to guide the optimization process in the treatment planning system (TPS). More recently, the rise of deep learning (DL) methods [
      • Boldrini L.
      • Bibault J.-E.
      • Masciocchi C.
      • Shen Y.
      • Bittner M.-I.
      Deep learning: A review for the radiation oncologist.
      ,
      • Shen C.
      • Nguyen D.
      • Zhou Z.
      • Jiang S.B.
      • Dong B.
      • Jia X.
      An introduction to deep learning in medical physics: Advantages, potential, and challenges.
      ] has led to an alternative approach where the database of prior plans is used instead to train a DL model that directly predicts a three-dimensional dose distribution for the new patient [
      • Shao Y.
      • Zhang X.
      • Wu G.
      • Gu Q.
      • Wang J.
      • Ying Y.
      • et al.
      Prediction of three-dimensional radiotherapy optimal dose distributions for lung cancer patients with asymmetric network.
      ,
      • Zhou J.
      • Peng Z.
      • Song Y.
      • Chang Y.
      • Pei X.
      • Sheng L.
      • et al.
      A method of using deep learning to predict three-dimensional dose distributions for intensity-modulated radiotherapy of rectal cancer.
      ,
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Fan J.
      • Wang J.
      • Chen Z.
      • Hu C.
      • Zhang Z.
      • Hu W.
      Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      ,
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ,

      Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

      ,
      • Chen X.
      • Men K.
      • Li Y.
      • Yi J.
      • Dai J.
      A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning.
      ,
      • Kearney V.
      • Chan J.W.
      • Haaf S.
      • Descovich M.
      • Solberg T.D.
      DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks.
      ]. DL is a sub-family of machine learning (ML), and its success lies in the fact that, in contrast to classical ML algorithms, they are able to learn without the need of hand-crafted features. This allows us to extract the most relevant global and local features from the patient’s anatomy (input images) to later map each voxel to the optimal dose value. This voxel-wise dose map is then used to guide the optimization process in the TPS and generate the final treatment plan, which is known as the dose mimicking process [
      • Fan J.
      • Wang J.
      • Chen Z.
      • Hu C.
      • Zhang Z.
      • Hu W.
      Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      ,
      • Petersson K.
      • Nilsson P.
      • Engström P.
      • Knöös T.
      • Ceberg C.
      Evaluation of dual-arc VMAT radiotherapy treatment plans automatically generated via dose mimicking.
      ,
      • McIntosh C.
      • Welch M.
      • McNiven A.
      • Jaffray D.A.
      • Purdie T.G.
      Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
      ]. Alternatively, DVH metrics can be computed from the predicted dose distribution and used as objectives for an inverse optimization problem to generate the treatment plan [
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      Knowledge-based automated planning with three-dimensional generative adversarial networks.
      ,
      ,
      • Babier A.
      • Boutilier J.J.
      • Sharpe M.B.
      • McNiven A.L.
      • Chan T.C.Y.
      Inverse optimization of objective function weights for treatment planning using clinical dose-volume histograms.
      ]
      Since the first publications in 2017 [

      Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

      ,
      • McIntosh C.
      • Welch M.
      • McNiven A.
      • Jaffray D.A.
      • Purdie T.G.
      Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
      ], the research on dose prediction with deep learning has become a very popular topic. Numerous studies have investigated its application to different tumor locations and treatment modalities [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Fan J.
      • Wang J.
      • Chen Z.
      • Hu C.
      • Zhang Z.
      • Hu W.
      Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      ,
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ,

      Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

      ,
      • Chen X.
      • Men K.
      • Li Y.
      • Yi J.
      • Dai J.
      A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning.
      ,
      • Kearney V.
      • Chan J.W.
      • Haaf S.
      • Descovich M.
      • Solberg T.D.
      DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks.
      ,
      • McIntosh C.
      • Welch M.
      • McNiven A.
      • Jaffray D.A.
      • Purdie T.G.
      Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
      ,
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      Knowledge-based automated planning with three-dimensional generative adversarial networks.
      ,
      ,
      • Liu Z.
      • Fan J.
      • Li M.
      • Yan H.
      • Hu Z.
      • Huang P.
      • et al.
      A deep learning method for prediction of three-dimensional dose distribution of helical tomotherapy.
      ,
      • Song Y.
      • Hu J.
      • Liu Y.
      • Hu H.
      • Huang Y.
      • Bai S.
      • et al.
      Dose prediction using a deep neural network for accelerated planning of rectal cancer radiotherapy.
      ,
      • Murakami Y.
      • Magome T.
      • Matsumoto K.
      • Sato T.
      • Yoshioka Y.
      • Oguchi M.
      Fully automated dose prediction using generative adversarial networks in prostate cancer patients.
      ,
      • Kandalan R.N.
      • Nguyen D.
      • Rezaeian N.H.
      • Barragán-Montero A.M.
      • Breedveld S.
      • Namuduri K.
      • et al.
      Dose prediction with deep learning for prostate cancer radiation therapy: Model adaptation to different treatment planning practices.
      ,
      • Kearney V.
      • Chan J.W.
      • Wang T.
      • Perry A.
      • Descovich M.
      • Morin O.
      • et al.
      DoseGAN: a generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation.
      ,
      • Ma M.
      • Kovalchuk N.
      • Buyyounouski M.K.
      • Xing L.
      • Yang Y.
      Incorporating dosimetric features into the prediction of 3D VMAT dose distributions using deep convolutional neural network.
      ,
      • Kajikawa T.
      • Kadoya N.
      • Ito K.
      • Takayama Y.
      • Chiba T.
      • Tomori S.
      • et al.
      A convolutional neural network approach for IMRT dose distribution prediction in prostate cancer patients.
      ,
      • Guerreiro F.
      • Seravalli E.
      • Janssens G.O.
      • Maduro J.H.
      • Knopf A.C.
      • Langendijk J.A.
      • et al.
      Deep learning prediction of proton and photon dose distributions for paediatric abdominal tumours.
      ]. The main DL architectures investigated in these studies have been variants of U-Net (a type of Convolutional Neural Network with a downsampling and an upsampling part [

      Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing; 2015, p. 234–41.

      ]) and Generative Adversarial Networks (GANs) [
      • Goodfellow I.J.
      • Pouget-Abadie J.
      • Mirza M.
      • Xu B.
      • Warde-Farley D.
      • Ozair S.
      • et al.
      Generative Adversarial Networks.
      ]. Additionally, different ways to improve the performance of the models (e.g. dense connections [
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ], dilated convolutions [
      • Zhang J.
      • Liu S.
      • Li T.
      • Mao R.
      • Du C.
      • Liu J.
      Voxel-level radiotherapy dose prediction using densely connected network with dilated convolutions.
      ], dvh-loss functions [
      • Nguyen D.
      • McBeth R.
      • Sadeghnejad Barkousaraie A.
      • Bohara G.
      • Shen C.
      • Jia X.
      • et al.
      Incorporating human and learned domain knowledge into training deep neural networks: A differentiable dose-volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy.
      ] or extra input channels [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Zhang J.
      • Liu S.
      • Li T.
      • Mao R.
      • Du C.
      • Liu J.
      Voxel-level radiotherapy dose prediction using densely connected network with dilated convolutions.
      ]) have been explored. However, little attention has been paid to the influence of the training database on the model’s performance. Provided that DL is a fully data-driven technology, the quality of the database used to train the model can significantly influence the results. In computer science, the term Garbage In Garbage Out (GIGO) describes the fact that a model trained with a low quality database will produce faulty outputs. A high quality database should then contain a large number of properly labelled samples that are representative of the problem to solve. For dose prediction problems, this implies a sufficiently large database to cover the patient population distribution (e.g. anatomical variability, tumor size, …), where each treatment plan ensures the optimal radiation dose distribution for each patient according to specific clinical goals.
      For computer vision tasks, the existing databases (e.g. ImageNet) contain thousands to millions of natural images (e.g. animals, objects, …), and it has been shown that the model’s performance increases logarithmically based on volume of training data size [
      • Sun C.
      • Shrivastava A.
      • Singh S.
      • Gupta A.
      Revisiting Unreasonable Effectiveness of Data in Deep Learning Era.
      ]. However, in the medical field, reaching a database of 50 patients with well-curated images and annotations is already a challenge for most institutions [
      • Willemink M.J.
      • Koszek W.A.
      • Hardell C.
      • Wu J.
      • Fleischmann D.
      • Harvey H.
      • et al.
      Preparing Medical Imaging Data for Machine Learning.
      ]. Given the ability of current state-of-the-art DL architectures used for dose prediction to learn from small databases [

      Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing; 2015, p. 234–41.

      ], the unwritten rule is that a minimum of 50 to 100 patients is enough to build a good model. Hence, published studies so far trained models with a retrospective clinical database, containing a number of patients in the range of 80 to 200 [
      • Shao Y.
      • Zhang X.
      • Wu G.
      • Gu Q.
      • Wang J.
      • Ying Y.
      • et al.
      Prediction of three-dimensional radiotherapy optimal dose distributions for lung cancer patients with asymmetric network.
      ,
      • Zhou J.
      • Peng Z.
      • Song Y.
      • Chang Y.
      • Pei X.
      • Sheng L.
      • et al.
      A method of using deep learning to predict three-dimensional dose distributions for intensity-modulated radiotherapy of rectal cancer.
      ,
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Fan J.
      • Wang J.
      • Chen Z.
      • Hu C.
      • Zhang Z.
      • Hu W.
      Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      ,
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ,

      Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

      ,
      • Chen X.
      • Men K.
      • Li Y.
      • Yi J.
      • Dai J.
      A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning.
      ,
      • Kearney V.
      • Chan J.W.
      • Haaf S.
      • Descovich M.
      • Solberg T.D.
      DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks.
      ]. However, it is unclear how the clinical variability in a retrospective database (e.g. different planners, treatment machines, physicians, …) and the size of the database influence the results.
      The present study aims to switch the attention from model to data, by analysing the effect of data quantity and quality in the performance of DL models for dose prediction, in the specific framework of neoadjuvant radiotherapy treatment for esophageal cancer. This experimental study compares the performance of a DL model for dose prediction trained with a retrospective clinical database, including typical variability (such as different physician contourers, physicist planners, treatment machines and planning class-solutions), with a homogenized database, specifically created to reduce all possible variabilities. Additionally, the effect of the size of the database was analysed.

      2. Materials and methods

      2.1 Patient database

      Two different databases were used in this study, referred to as the “variable database” (VarDB) and “homogenized database” (HomDB).
      The variable database was extracted retrospectively from our patient directory, and contained 56 patients with esophageal cancer treated with intensity modulated radiotherapy (IMRT) at our institution, from 2016 to 2020. Since the treatment protocol has evolved over time, this database (Fig. 1, left) contained three different treatment machines (Clinac 2100C/D, TrueBeam, and Halcyon, Varian Medical Systems, Palo Alto CA) and different beam configurations (from 5 to 9 coplanar beams) and beam energies (6 or 10 MV). In addition, it involved different physicians and medical physicists for contouring and planning, respectively. Dose calculation and optimization was done using the AAA algorithm (version 10.0.28 or 15.6.03), from the treatment planning system (Eclipse, Varian Medical Systems, Palo Alto CA). The version of the treatment planning system evolved during the time frame of the study from Eclipse 13, Eclipse 15.1 and Eclipse 15.6.
      Figure thumbnail gr1
      Fig. 1Illustration of the variability, with respect to the number of machines and beams, for the variable database (VarDB) versus the homogenized database (HomDB). Each marker represents a patient in the database, amounting a total of 56 patients in both databases. Note that the figure does not visually show the user-dependent variability (i.e. different physicians for contouring and different planners), which was additionally present in the VarDB, and reduced in the HomDB, since all operations were performed by the same observers.
      The homogenized database (Fig. 1, right) was intentionally built to reduce the variabilities in the variable database, by re-contouring and re-planning the same 56 patients from the variable database, carefully following organ at risk delineation guidelines [
      • Jabbour S.K.
      • Hashem S.A.
      • Bosch W.
      • Kim T.K.
      • Finkelstein S.E.
      • Anderson B.M.
      • et al.
      Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
      ,
      • Feng M.
      • Moran J.M.
      • Koelling T.
      • Chughtai A.
      • Chan J.L.
      • Freedman L.
      • et al.
      Development and validation of a heart atlas to study cardiac exposure to radiation following treatment for breast cancer.
      ,

      Kong FM, Quint L, Machtay M, Bradley J. Atlas for organs at risk (OARS) in thoracic radiation therapy 2013.

      ] and a fixed planning protocol. The 56 patients were uniformly planned using a seven beam IMRT class-solution on Halcyon (beam configuration at 0°, 30°, 60°, 155°, 220°, 300°, 330° angles) and an updated list of dose constraints [

      Ajani JA, D’Amico TA, Bentrem DJ, Chao J, Corvera C, Das P, et al. Esophageal and Esophagogastric Junction Cancers, Version 2.2019, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2019;17:855–83.

      ]. Optimisation was performed by the same observer (medical physicist with 10 years of experience in inverse planning optimization with Eclipse), starting from a fixed set of objectives and weights and clinical judgement by one radiation oncologist, for all plans. Higher priorities were set for the dose constraints in the lungs, since recent studies [
      • Beukema J.C.
      • Kawaguchi Y.
      • Sijtsema N.M.
      • Zhai T.-T.
      • Langendijk J.A.
      • van Dijk L.V.
      • et al.
      Can we safely reduce the radiation dose to the heart while compromising the dose to the lungs in oesophageal cancer patients?.
      ,
      • Thomas M.
      • Defraene G.
      • Lambrecht M.
      • Deng W.
      • Moons J.
      • Nafteux P.
      • et al.
      NTCP model for postoperative complications and one-year mortality after trimodality treatment in oesophageal cancer.
      ] have found more evidence for the correlation between lung dose volume parameters and pulmonary toxicity and survival [
      • Lee H.K.
      • Vaporciyan A.A.
      • Cox J.D.
      • Tucker S.L.
      • Putnam J.B.
      • Ajani J.A.
      • et al.
      Postoperative pulmonary complications after preoperative chemoradiation for esophageal carcinoma: correlation with pulmonary dose–volume histogram parameters.
      ,
      • Wang S.-L.
      • Liao Z.
      • Vaporciyan A.A.
      • Tucker S.L.
      • Liu H.
      • Wei X.
      • et al.
      Investigation of clinical and dosimetric factors associated with postoperative pulmonary complications in esophageal cancer patients treated with concurrent chemoradiotherapy followed by surgery.
      ,
      • Tucker S.L.
      • Liu H.H.
      • Wang S.
      • Wei X.
      • Liao Z.
      • Komaki R.
      • et al.
      Dose-volume modeling of the risk of postoperative pulmonary complications among esophageal cancer patients treated with concurrent chemoradiotherapy followed by surgery.
      ]. Hence, all treatment plans in this database were guided towards the lowest possible lung V5, V10, V20, V40 and Dmean. The dose calculation algorithm was AAA version 15.6.03, from the treatment planning system (Eclipse, Varian Medical Systems, Palo Alto CA). The version of Eclipse was the same (15.6) for all HomDB plans.
      Detailed information about the contour delineation and the replanning objectives to generate the HomDB can be found in the Supplementary material. For the VarDB plans, the constraints, objectives and priorities evolved over time, but the exact information could not be retrieved due to the retrospective nature of the database.
      For both the variable and homogenized databases, the prescribed total radiation dose to the planning target volume (PTV) was 45.0 Gy in fractions of 1.8 Gy. The dose map grid size was the same for both databases and equal to 2.5 mm × 2.5 mm (in plane) x3mm (slice thickness).
      The use of patient data for the study was approved by the Institutional Ethical Review Board of the University Hospitals Leuven (S59667).

      3. Model architecture

      The DL model architecture was inspired from the popular U-Net [

      Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing; 2015, p. 234–41.

      ], a type of fully convolutional network widely used for medical image segmentation and other medical applications [
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ,
      • Liu L.
      • Cheng J.
      • Quan Q.
      • Wu F.-X.
      • Wang Y.-P.
      • Wang J.
      A survey on U-shaped networks in medical image segmentations.
      ,
      • Wu C.
      • Nguyen D.
      • Xing Y.
      • Barragan A.
      • Schuemann J.
      • Shang H.
      • et al.
      Improving proton dose calculation accuracy by using deep learning.
      ,
      • Xing Y.
      • Zhang Y.
      • Nguyen D.
      • Lin M.-H.
      • Lu W.
      • Jiang S.
      Boosting radiotherapy dose calculation accuracy with deep learning.
      ]. U-Net type of architectures have gained interest for the prediction of dose distributions in radiotherapy treatments, due to its ability to include both local and global features from the input images (i.e. the patient’s anatomy) to generate a pixel-wise or voxel-wise prediction, with its two-dimensional (2D) or 3D variant, respectively [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,

      Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

      ,
      • Zhang J.
      • Liu S.
      • Li T.
      • Mao R.
      • Du C.
      • Liu J.
      Voxel-level radiotherapy dose prediction using densely connected network with dilated convolutions.
      ]. In our case, we used a variant of the 3D U-Net that has been first developed by Nguyen et al. [
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ], and included dense connections [

      Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. https://doi.org/10.1109/cvpr.2017.243.

      ] between the convolutional layers within the same resolution or “hierarchy” in the U-Net. This Hierarchically Densely Connected U-Net or HD U-Net has demonstrated a more efficient feature propagation with respect to the standard U-Net [
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ]. The HD U-Net has been previously used for dose prediction in head and neck [
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ] and lung cancer patients [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ]. Mean Squared Error (MSE) between the predicted dose and the ground truth dose was used as loss function. The detailed architecture has been described elsewhere [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ] and it is presented in Fig. 2. The implementation was done in Python, using dedicated DL libraries (i.e. Tensorflow and Keras).
      Figure thumbnail gr2
      Fig. 2Architecture of the model used in this study. Black numbers on the left side of the model represent the volume shape and resolution at a specific hierarchy. Red numbers represent the number of feature maps at a particular layer, starting with the 11 input channels and 16 feature maps. Orange features represent the newly calculated features and trainable parameters to learn, while blue features are copied or max pooled features that do not need trainable parameters. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      For our particular case, all dose distributions and contour masks were resampled to a voxel resolution of 3 × 3 × 3 mm3, and the model was trained using image patches of 128 × 128 × 96 voxels. Eleven input channels were used, which comprised 3D binary masks for the available region of interests (ROIs): (1) one channel containing a mask with the prescription dose (Dpre = 45.0 Gy) for voxels inside and 0 for voxels outside the PTV; and (2) ten channels containing a mask with 1 for voxels inside and 0 for voxels outside relevant organs at risk for esophageal cancer (i.e. body, heart, lungs, liver, left and right kidney, spinal canal, spinal canal + 3 mm, fundus and left ventricle). The number of starting feature maps was 16.

      4. Experiments

      To analyse the effect of the quality and size of the database in the model’s performance, two different experiments (Experiment, E) were conducted. Fig. 3 illustrates all considered models.
      Figure thumbnail gr3
      Fig. 3Summary of all considered models, trained with different subsets of patients from the variable (VarDB) and homogenized (HomDB) databases. Each marker represents a patient, the color and the shape defines the beam setup and the treatment machine, respectively. Experiment 1 analysed the user-dependent variability, for a given treatment machine and beam setup. Hence, it involved two models: E26-VarDB versus E26-HomDB. Experiment 2 analysed the size of the database, and involved four extra models for each database: E16-VarDB versus E16-HomDB, E36-VarDB versus E36-HomDB, E46-VarDB versus E46-HomDB, and E56-VarDB versus E56-HomDB.
      The first experiment (Experiment 1) aimed at investigating the effect of user-dependent variability (i.e. planning and delineation variability), for a given treatment machine and beam configuration. Hence, for this experiment, we used all patients planned with 7 beams for Halcyon (Fig. 1, blue circles), which amounted to a total of 26 patients. Two models were trained and tested with the patients coming from the variable (E26-VarDB) and the homogenized database (E26-HomDB), respectively. The training was performed using 20 patients and following a 5-fold cross-validation procedure, while the testing was done on the remaining 6 patients, using all 5 cross-validation models independently. The goal of this first experiment was to investigate the influence of the user-dependent variability (i.e. planning and delineation) on the model’s performance, without being biased by other factors associated with the plan setup, such as the beam configuration and the choice of treatment machine.
      The second experiment (Experiment 2) analysed the influence of the database size on the model’s performance, for both the variable and the homogenized databases. Starting from the subset of 26 patients used for Experiment 1, the size of the training database was increased and decreased with groups of 10 patients. When decreasing the database, only one extra model was created, using 16 patients (E16-VarDB vs E16-HomDB). When increasing the database, three extra models were created until covering the full dataset for both the variable and homogenized databases, which were trained and tested with 36 (E36-VarDB and E36-HomDB), 46 (E46-VarDB and E46-HomDB), and 56 patients (E56-VarDB and E56-HomDB), respectively.
      The current state-or-the-art in our clinic is to treat the esophageal cancer patients using 7 beams with Halcyon. Hence, for all experiments, the test is always performed in the same set of 6 patients (planned with 7 beams for Halcyon), since these are the plans we want to be able to predict for future patients. The training was done following a k-fold cross-validation process, which consists in splitting the training data into k subsets, to later run k models where one subset is used for validation (a different one each time) and the remaining k-1 subsets are used for training. A 5-fold cross-validation was used in our case, where the five models obtained were later used to predict the dose on the 6 patients from the test set. For each fold, the network was trained during 250 epochs and the weights corresponding to the epoch with the lowest validation loss were selected as the final model. All these operations were performed on an NVIDIA GeForce RTX 2080 Graphical Processing Unit (GPU) with 11 GB dedicated RAM. The training time for each model was depending on the number of patients used, taking up to 1.6, 2.2, 3.4, 4.7, and 6 h for E16, E26, E36, E46, and E56, respectively, for each fold in the cross-validation. The inference time was the same for all models and equal to an average value of 24 s.
      The accuracy of the models was evaluated by analysing the difference between relevant dose-volume histogram (DVH) metrics for the predicted and the ground truth doses, using either box plots or the mean absolute error (MAE) for each of the considered metrics. In order to have a single value that allows us to rate each model with respect to the others, an average composite error was computed by averaging the absolute errors from a set of DVH metrics including: PTV [D95, D5], lungs [Dmean, V5, V10, V20, V30, V40], heart [Dmean, V40], liver [Dmean, V20, V30], kidneys [Dmean], spinal canal + 3 [D2], left ventricle [Dmean], and fundus [Dmean]. The absolute error for the dose metrics was expressed as a percentage of the prescription dose, while the error for the volume metrics was expressed as volume percentage. In addition, the MAE between the predicted dose and the ground truth for all voxels in the body contour was computed.

      5. Results

      5.1 Experiment 1

      Fig. 4 shows the results for Experiment 1 (model E26-VarDB versus E26-HomDB) on the test set, comparing the error for relevant DVH metrics for the model trained on the variable and the homogenized database, respectively. Although the median value is similar for both models in most of the organs, the box plots show a reduction of the error for the HomDB database, since they are narrower than those for E26-VarDB. This is confirmed by looking at the MAE of the DVH metrics (Table 1). On the one hand, the error on the PTV dose coverage (D95) for the test set decreased from 2.03 ± 1.37% of the prescribed dose to 1.59 ± 1.41%, for the VarDB and HomDB models, respectively. On the other hand, the error for the considered organs at risk also improved for most DVH metrics. For instance, the error for the heart Dmean on the test set was reduced by 2.7% of the prescribed dose (1.2 Gy) for the HomDB model, in comparison with the VarDB model. The improvement on the predicted Dmean in the test set for other organs, such as the lungs or liver, was smaller (around 0.2% of the dose prescription). Table 1 also summarizes the results from the cross-validation, which shows an even larger difference between the VarDB and HomDB models for some specific cases. In particular, the HomDB model outperformed the VarDB model for the lung V5, which improved by 3.3% (absolute volume).
      Figure thumbnail gr4
      Fig. 4Comparison between model E26-VarDB and E26-HomDB (Experiment 1) for the evaluation on the test set (including all five cross-validation models). Box plots for the difference between relevant DVH metrics for the real and predicted doses on the PTV volume and the organs included in the model’s input channels (i.e., lungs, heart, spinal canal + 3, liver, right and left kidney, left ventricle and fundus). VarDB refers to the variable database while HomDB refers to the homogenized database.
      Table 1Mean absolute error (MAE) and its standard deviation (mean ± SD) for relevant DVH metrics on the PTV and on several organs for cross-validation (average prediction on the validation set for all five folds), and testing (average prediction on the test set for all five folds), for all models from Experiment 1 and 2. The values are expressed as percentage of the prescription dose (Dpre = 45.0 Gy) for the metrics reporting the dose received by x% of volume (Dx), and as absolute difference for the metrics reporting the volume (in %) receiving a dose of y Gy (Vy).
      Mean absolute error for DVH metrics
      Cross-validation (mean ± SD)Testing (mean ± SD)
      VarDBHomDBVarDBHomDB
      PTVD95 (% Dpre)E162.17 ± 0.991.15 ± 0.863.12 ± 1.801.39 ± 1.05
      E261.87 ± 0.851.45 ± 1.282.03 ± 1.371.59 ± 1.41
      E361.68 ± 1.400.95 ± 0.771.68 ± 2.350.95 ± 0.99
      E461.63 ± 1.191.11 ± 0.812.81 ± 2.301.13 ± 0.76
      E561.55 ± 1.301.98 ± 1.032.77 ± 2.211.93 ± 1.15
      D5 (% Dpre)E161.37 ± 0.811.57 ± 0.951.36 ± 0.671.13 ± 0.84
      E262.87 ± 1.902.11 ± 2.052.94 ± 2.011.96 ± 1.73
      E361.77 ± 1.251.14 ± 0.771.77 ± 0.911.14 ± 0.63
      E462.78 ± 2.161.63 ± 1.192.48 ± 2.051.54 ± 1.19
      E562.24 ± 1.241.43 ± 1.031.95 ± 1.101.38 ± 0.87
      LiverDmean (% Dpre)E16

      E26

      E36

      E46

      E56
      2.71 ± 3.08

      2.26 ± 1.66

      2.96 ± 2.17

      3.75 ± 2.35

      3.22 ± 2.37
      1.58 ± 1.10

      1.59 ± 1.19

      1.36 ± 0.94

      1.69 ± 0.96

      1.64 ± 1.03
      2.38 ± 1.97

      1.86 ± 1.34

      2.96 ± 1.30

      3.99 ± 1.81

      4.02 ± 2.04
      2.19 ± 1.58

      1.79 ± 1.05

      1.36 ± 0.92

      2.00 ± 0.96

      2.10 ± 1.32
      HeartDmean (% Dpre)E16

      E26

      E36

      E46

      E56
      6.76 ± 4.97

      6.49 ± 4.04

      5.17 ± 4.05

      6.66 ± 3.98

      5.96 ± 4.23
      3.15 ± 2.53

      3.50 ± 2.09

      3.75 ± 2.72

      3.31 ± 2.87

      2.82 ± 1.75
      5.30 ± 3.24

      5.39 ± 3.75

      5.17 ± 3.57

      5.07 ± 3.52

      3.96 ± 3.02
      3.59 ± 1.99

      2.69 ± 1.82

      3.75 ± 1.90

      3.02 ± 1.56

      2.35 ± 1.59
      V40 (%)E16

      E26

      E36

      E46

      E56
      2.78 ± 2.52

      1.75 ± 1.12

      2.01 ± 2.48

      2.48 ± 1.99

      2.23 ± 1.50
      1.55 ± 0.91

      1.52 ± 1.19

      0.92 ± 0.67

      0.88 ± 0.80

      0.86 ± 0.74
      1.70 ± 0.90

      1.13 ± 0.56

      2.01 ± 1.16

      1.75 ± 0.87

      1.39 ± 0.64
      1.67 ± 0.89

      1.64 ± 1.03

      0.92 ± 1.02

      1.24 ± 1.12

      1.39 ± 0.92
      SC + 3 mmD2 (% Dpre)E16

      E26

      E36

      E46

      E56
      5.07 ± 3.51

      4.70 ± 3.71

      5.32 ± 4.70

      8.66 ± 5.85

      7.27 ± 4.37
      4.13 ± 3.83

      4.63 ± 2.88

      4.68 ± 3.35

      3.54 ± 2.01

      4.48 ± 3.07
      6.76 ± 3.30

      5.26 ± 3.41

      5.32 ± 3.65

      9.75 ± 3.45

      7.62 ± 4.03
      5.03 ± 3.25

      4.08 ± 3.33

      4.68 ± 3.32

      5.91 ± 4.21

      5.84 ± 4.24
      LungsDmean (% Dpre)E16

      E26

      E36

      E46

      E56
      2.55 ± 1.71

      2.81 ± 2.20

      2.31 ± 1.66

      2.83 ± 1.93

      2.54 ± 1.87
      1.42 ± 1.25

      1.53 ± 0.92

      1.34 ± 1.21

      1.35 ± 1.26

      1.35 ± 1.26
      1.29 ± 0.85

      1.33 ± 1.07

      2.31 ± 1.17

      1.86 ± 1.15

      2.08 ± 1.35
      1.18 ± 0.86

      1.07 ± 0.72

      1.34 ± 0.57

      0.86 ± 0.65

      0.75 ± 0.63
      V5 (%)E16

      E26

      E36

      E46

      E56
      5.62 ± 3.39

      6.25 ± 4.72

      6.44 ± 5.68

      6.20 ± 5.50

      6.63 ± 6.65
      3.10 ± 3.18

      2.98 ± 2.41

      2.99 ± 2.66

      2.96 ± 2.03

      2.90 ± 2.53
      3.11 ± 2.67

      3.71 ± 2.61

      6.44 ± 2.43

      2.71 ± 2.32

      2.88 ± 2.28
      4.53 ± 5.02

      3.37 ± 3.03

      2.99 ± 3.14

      3.4 ± 3.14

      3.27 ± 3.07
      V20 (%)E16

      E26

      E36

      E46

      E56
      4.78 ± 4.75

      5.57 ± 3.25

      4.65 ± 3.76

      6.50 ± 4.36

      5.45 ± 3.84
      4.15 ± 2.73

      3.33 ± 2.17

      3.18 ± 2.64

      2.88 ± 3.18

      3.06 ± 2.50
      2.71 ± 2.50

      2.68 ± 2.13

      4.65 ± 2.13

      5.21 ± 3.03

      5.23 ± 3.60
      2.19 ± 1.79

      3.09 ± 2.07

      3.18 ± 1.37

      1.75 ± 1.31

      2.27 ± 1.33
      The composite error (Table 2) also reflects an overall improvement of the predicted DVH metrics for the HomDB model versus the VarDB model. For cross-validation, the error decreased around 1.3%, while for the test the reduction was slightly smaller (0.6%). Although the HomDB model clearly outperforms the VarDB model for predicting the DVH metrics, the MAE of all voxels within the body contour (Table 3) only showed a negligible improvement (less than 0.05 Gy).
      Table 2Average composite error. It was computed by averaging the absolute errors from a set of DVH metrics including: PTV [D95, D5], lungs [Dmean, V5, V10, V20, V30, V40], heart [Dmean, V40], liver [Dmean, V20, V30], kidneys [Dmean], spinal canal + 3 [D2], left ventricle [Dmean], and fundus [Dmean]. The absolute error for the dose metrics was expressed as a percentage of the prescription dose, while the error for the volume metrics was expressed as volume percentage.
      Cross-validation (mean ± SD)[SPS]code="TW" instruction="Fix table(column) widths"[/SPS]Testing (mean ± SD)
      VarDBHomDBVarDBHomDB
      Composite error (%)E163.84 ± 3.472.15 ± 2.022.99 ± 2.582.69 ± 2.80
      E263.69 ± 3.092.40 ± 2.202.95 ± 2.862.33 ± 2.18
      E363.56 ± 3.282.19 ± 2.143.56 ± 2.872.19 ± 2.07
      E464.42 ± 3.822.19 ± 2.093.82 ± 2.872.31 ± 2.33
      E563.93 ± 3.742.11 ± 1.943.39 ± 2.662.28 ± 2.00
      Table 3Mean absolute error (MAE) for all voxels within the body contour corresponding to all models under study (E16, E26, E36, E46, and E56), trained with the variable (VarDB) and homogenized (HomDB) database, averaged over the validation and test sets. The results are expressed as the mean value over all patients for the five cross-validation models, together with the standard deviation in Gy (mean ± SD).
      Mean absolute error (MAE)
      Cross-validation mean ± SD (Gy)Testing mean ± SD (Gy)
      VarDBHomDBVarDBHomDB
      E161.51 ± 0.301.44 ± 0.271.34 ± 0.421.48 ± 0.40
      E261.39 ± 0.391.34 ± 0.331.28 ± 0.431.27 ± 0.32
      E361.76 ± 0.541.24 ± 0.301.46 ± 0.371.23 ± 0.33
      E461.77 ± 0.481.22 ± 0.311.50 ± 0.421.19 ± 0.35
      E561.76 ± 0.501.21 ± 0.271.48 ± 0.441.24 ± 0.32
      Fig. 5 shows an example for the DVH (for the PTV and relevant organs) of a selected patient from the test set, for both the VarDB (top) and HomDB models (bottom). For this patient, the highest difference can be observed in the DVH for the heart, where the HomDB model clearly reduced the prediction error.
      Figure thumbnail gr5
      Fig. 5Example of the DVH for a selected patient from the test set for Experiment 1. Top figure shows the DVH corresponding to the model trained with the variable database (VarDB), for the predicted (dash line) and ground truth doses (solid line); while the bottom figure shows the DVH for the model trained with the homogenized database (HomDB).

      5.2 Experiment 2

      For the models trained with the homogenized database, the MAE for all voxels within the body contour (Table 3) shows a trend to improve during cross-validation when increasing the size of the training database with respect to Experiment 1 (E36, E46, and E56), i.e. a decrease of 0.13 Gy for the E56-HomDB model with respect to the E26-HomDB. For the test set, the improvement is slightly smaller, with a reduction of 0.08 Gy and 0.05 Gy for the E46-HomDB and E56-HomDB models, respectively. In contrast, for the models trained with the variable database, the results show an inverse trend, with an increase in the MAE when sequentially adding more patients to the training set used in Experiment 1. For instance, the MAE for the E56-VarDB model increased by 0.37 Gy with respect to the E26-VarDB for the cross-validation set, and 0.20 Gy for the test set. Decreasing the size of the database with respect to Experiment 1 (E16-HomDB and E16-VarDB) resulted in worse results for the MAE for both cross-validation and testing (Table 3).
      The MAE for HomDB models was always better than that of VarDB models, except for E16 testing results.
      The error for relevant DVH metrics on the test set for the PTV and a few selected organs is shown with box plots in Fig. 6. For the homogenized database (Fig. 6, right), a slight improvement can be observed for the lungs and heart Dmean when increasing the size of the training set with respect to Experiment 1. However, for the rest of the plotted DVH metrics (PTV D95, Dmean liver), there is no clear trend. For the variable database (Fig. 6, left), increasing the size of the training set did not result in an improvement of the results for none of the plotted metrics. For the lungs and liver Dmean, the prediction error gradually increased from the E26-VarDB to the E56-VarDB model.
      Figure thumbnail gr6
      Fig. 6Comparison between all the models trained with an increasing database of 16, 26, 36, 46, and 56 patients (Experiment 2), for the variable (left) and the homogenized (right) databases. The box plots present the results for the evaluation on the test set (including all five cross-validation models). Each box plot shows the difference between relevant DVH metrics for the real and predicted doses on the PTV volume and selected organs (i.e., lungs, heart, spinal canal + 3, liver).
      Fig. 6 also shows that decreasing the size of the database resulted in wider box plots for the OARs in both the variable (E16-VarDB) and the homogenized database (E16-HomDB). In contrast, the PTV D95 presented narrower boxplots when using this smaller patient group, especially for the homogenized database.
      Table 1 presents a more complete overview of the MAE for the selected DVH metrics, for both the models trained with the variable and homogenized databases, which can be summarized in the average composite error presented in Table 2. Please note that the comparison of the cross-validation results should be done carefully, since different validation sets were used in E16 (2 patients), E26 (4 patients), and E36, E36, E56 (6 patients) models (Fig. 3). For the homogenized database, the composite error is slightly better for the patients with bigger training set (e.g. 2.28 ± 2.00% for E56-HomDB versus 2.33 ± 2.18% for E26-HomDB for the test set), while the models trained with the variable database present a decreased accuracy when adding more training patients (e.g. 3.39 ± 2.66% for E56-VarDB on the test set versus 2.95 ± 2.86% for E26-VarDB).

      6. Discussion

      The present study aimed at analysing the influence of the size and quality of the training database on the DL model’s performance, in the framework of dose prediction for esophageal cancer patients. For this purpose, two different experiments were designed, which compared models that were trained using a variable and a homogenized database independently (see Section 2). It is important to note that the data quality for DL models refers to how well your training set represents your evaluation set (i.e. validation/test set). Thus, for our particular problem, a higher-quality database (i.e. the homogenized database, HomDB) is defined as the one having a higher consistency between the training and test set, with respect to both user-dependent variability (delineation and planning) and planning setup (class-solution protocol, treatment machine, beam configuration).
      In the first experiment, which aimed at analysing the influence of the user-dependent variability (i.e. planning and delineation variability), the overall performance of the two considered models (E26-VarDB vs E26-HomDB) was very good, achieving a MAE below 1.3 Gy for all voxels within the body contour in the test patients (Table 3). However, the model trained with the homogenized database (E26-HomDB) showed a decrease of the prediction error with respect to the model trained with the variable database (E26-VarDB), indicating a more robust model. Specifically for the lungs, where the dose objectives (V5, V10, V20, V40 and Dmean) for the homogenized database were systematically pushed down towards the lowest possible value, the overperformance of the HomDB models is clear (Table 1). These results demonstrate that a higher consistency with respect to the user-dependent factors (delineation and planning) in the training database has a beneficial effect on the performance of the model. Note that this experiment analysed the user-dependent variability as a whole, but could not distinguish between the individual influence of the variability in delineation and planning, respectively, since this information was impossible to retrieve from the clinical data. Further investigation is required to analyse the individual influence of these two factors.
      The improved performance of the HomDB model compared to the VarDB one brings up the popular debate about “data mining or data farming” for deep learning applications in the radiation oncology field [
      • Mayo C.S.
      • Kessler M.L.
      • Eisbruch A.
      • Weyburne G.
      • Feng M.
      • Hayman J.A.
      • et al.
      The big data effort in radiation oncology: Data mining or data farming?.
      ], and reinforces the position for a data farming approach. Data mining refers to the approach in which the data is just extracted from retrospective databases and directly analysed, assuming the data is well-curated and ready to be used. In contrast, data farming aims to shape and build the databases according to the application under development. In this sense, our results highlight the importance of training DL models with well-curated data, since any variability, errors or bias in the training database will be reflected in the results. Hence, we recommend to carefully analyse the clinical requirements for the DL model, and build or “farm” the database accordingly. Moreover, the building of the database may allow to include newly gained clinical insights in the prediction model. For instance, in our particular application for dose prediction in esophageal cancer patients, special attention was put into correctly predicting the dose to the lungs, since recent studies have found a correlation between lung dose metrics, pulmonary toxicity and survival [
      • Beukema J.C.
      • Kawaguchi Y.
      • Sijtsema N.M.
      • Zhai T.-T.
      • Langendijk J.A.
      • van Dijk L.V.
      • et al.
      Can we safely reduce the radiation dose to the heart while compromising the dose to the lungs in oesophageal cancer patients?.
      ,
      • Thomas M.
      • Defraene G.
      • Lambrecht M.
      • Deng W.
      • Moons J.
      • Nafteux P.
      • et al.
      NTCP model for postoperative complications and one-year mortality after trimodality treatment in oesophageal cancer.
      ]. Thus, one needs to have this in mind when building the database and carefully check the optimality of the most relevant dose constraints.
      The second experiment investigated the influence of the size of the training dataset on the models performance, by decreasing and increasing the database with respect to Experiment 1. This resulted in a total of five models, including the one generated in Experiment 1: E16, E26, E36, E46, and E56. For the homogenized database models, an improvement trend is observed when increasing the size of the training set from E26 to E56 models for both the global MAE (Table 3) and the DVH metrics (Table 1, Table 2). But surprisingly, despite having increased the training set by a factor 2, the improvement was small (e.g. improvement in MAE smaller than 0.15 Gy). In contrast, for the variable database models, the trend was inverted, and the prediction error increased when adding more patients to the training set (Table 1, Table 2, Table 3). This can be explained by the fact that the increase in the number of training patients for the clinical models goes with an increase in the database heterogeneity, including different treatment machines and beam configurations (Fig. 3). The larger the heterogeneity in the database, the harder it is to accurately predict the dose distributions, which again supports the need of using a consistent and high-quality database to get the most of your DL model. The results from Experiment 2 also showed that reducing the database too much (i.e. models E16-HomDB and E16-VarDB) can also lead to a decreased performance of the DL model.
      Regarding the metrics used for evaluation, it is important to discuss the fact that although the MAE is used in most publications about dose prediction studies and can be a general indication of the performance of the model, the final evaluation of the model should be based on clinically relevant metrics for the treatments under study. Indeed, the MAE takes into account a large part of the dose distribution that is outside relevant organs. Thus, in this study, the main conclusions are drawn based on the DVH metrics for relevant organs (Table 2, Fig. 3) in the test set, which included exactly the same patients in all models.
      We are aware of the limitations of this study, since an increase in the number of training patients in steps of 10 and a maximum of 56 might seem small, rather than using increments of 50 or 100 to reach a few hundreds patients. The latter would probably give us more solid statistics about the influence of the database size in the performance of the model. However, we believe that an increase of 10 to 20 patients in the training database is the order of magnitude that most clinics can achieve in the short term, especially for well-curated and consistent data as it is the case of the homogenized database. The same applies to the size of the testing set, since again, having a much larger set would help to gather more solid statistics about the accuracy of the models. However, the reality is that most clinics could only afford to use about 10–20% of the collected data for the testing phase (i.e. around 5 to 10 patients for a database of 50). It is important to note that the model trained with the subset of 26 patients performed already very well, in comparison to previous studies for dose prediction with deep learning. Recently, Zhang et al. [
      • Zhang J.
      • Liu S.
      • Yan H.
      • Li T.
      • Mao R.
      • Liu J.
      Predicting voxel-level dose distributions for esophageal radiotherapy using densely connected network with dilated convolutions.
      ] developed a dose prediction model using a database of 78 esophageal cancer patients, which achieved a MAE in the body of 3.4% of the prescription dose (i.e. 1.5 Gy for a prescription dose of 45 Gy). Our VarDB models achieved a MAE that was always smaller than 1.3 Gy. But regardless of external comparisons with other models, which might be biased by the difference in databases and DL architectures, the results from the second experiment suggest that blindly increasing the training database will not bring much improvement to the performance of the model. Instead, more sophisticated solutions, such as active learning, should be used to efficiently train DL models. Active learning [
      • Blanch M.G.
      ] is a type of iterative supervised learning where the algorithm itself query the user to obtain new data points where they are most needed, in order to build up an optimally balanced training dataset.
      Another important point to be discussed is how the predicted dose distributions can be used to generate the final treatment plan in an end-to-end automatic planning workflow. As previously mentioned, the two main approaches used so far have been either dose mimicking strategies [
      • Fan J.
      • Wang J.
      • Chen Z.
      • Hu C.
      • Zhang Z.
      • Hu W.
      Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      ,
      • Petersson K.
      • Nilsson P.
      • Engström P.
      • Knöös T.
      • Ceberg C.
      Evaluation of dual-arc VMAT radiotherapy treatment plans automatically generated via dose mimicking.
      ,
      • McIntosh C.
      • Welch M.
      • McNiven A.
      • Jaffray D.A.
      • Purdie T.G.
      Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
      ] or inverse optimization [
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      Knowledge-based automated planning with three-dimensional generative adversarial networks.
      ,
      ,
      • Babier A.
      • Boutilier J.J.
      • Sharpe M.B.
      • McNiven A.L.
      • Chan T.C.Y.
      Inverse optimization of objective function weights for treatment planning using clinical dose-volume histograms.
      ]. A recently published paper from Babier et al. [
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      The importance of evaluating the complete automated knowledge-based planning pipeline.
      ] reported that the quality of the final treatment plan depends on how well the prediction and plan generation models (dose mimicking or inverse optimization) perform together. This demonstrates the importance of evaluating the full pipeline when aiming for clinical implementation of automatic planning workflows based on DL dose prediction. However, they were unable to isolate the root cause of those effects. Thus, we believe that further research is needed in order to analyse the source of errors for the prediction and plan generation models, independently. In this context, our article brings some light into the possible ways to increase the accuracy of DL methods, by carefully designing homogeneous and high-quality databases. In addition, it is important to mention that very recently, there have been few publications that have investigated how to directly predict the fluence maps with ML/DL models [
      • Wang W.
      • Sheng Y.
      • Wang C.
      • Zhang J.
      • Li X.
      • Palta M.
      • et al.
      Fluence Map Prediction Using Deep Learning Models – Direct Plan Generation for Pancreas Stereotactic Body Radiation Therapy. Front.
      ,
      • Sheng Y.
      • Li T.
      • Yoo S.
      • Yin F.-F.
      • Blitzblau R.
      • Horton J.K.
      • et al.
      Automatic planning of whole breast radiation therapy using machine learning models.
      ,
      • Ma L.
      • Chen M.
      • Gu X.
      • Lu W.
      Deep learning-based inverse mapping for fluence map prediction.
      ,
      • Li X.
      • Zhang J.
      • Sheng Y.
      • Chang Y.
      • Yin F.-F.
      • Ge Y.
      • et al.
      Automatic IMRT planning via static field fluence prediction (AIP-SFFP): a deep learning algorithm for real-time prostate treatment planning.
      ,
      • Lee H.
      • Kim H.
      • Kwak J.
      • Kim Y.S.
      • Lee S.W.
      • Cho S.
      • et al.
      Fluence-map generation for prostate intensity-modulated radiotherapy planning using a deep-neural-network.
      ], without the need of using dose mimicking or inverse optimization for the final plan generation. Future research should focus on further investigating and comparing this novel approach with the existing strategies for automatic planning, including those based on DL dose prediction.

      7. Conclusion

      To summarize, this study aims to raise awareness of the importance of the training database for deep learning applications, and in particular, for the dose prediction of radiation therapy treatments. The experiments conducted in this study quantified the effect of the data quality and quantity on the model’s performance, in the specific framework of dose prediction for esophageal cancer patients treated with IMRT. On the one hand, our results demonstrate that a higher-quality database, with reduced user-dependent variability in delineation and planning, leads to an improvement in the performance of the DL model (E26-HomDB versus E26-VarDB). On the other hand, the results also suggest that, increasing the size of the database may either have a detrimental effect in the model’s performance when the newly added data has a low-quality (E26-, E36-, E46-, and E56-VarDB), or either have a negligible positive effect when adding high-quality data (E26-, E36-, E46-, and E56-HomDB). The latter motivates further research to find strategies that allow us to build and expand medical databases in a more efficient way.

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgments

      Ana Barragán is funded by the Walloon region in Belgium (PROTHERWAL/CHARP, grant number 7289). Melissa Thomas was supported by Kom op tegen Kanker (Stand up to Cancer), the Flemish cancer society. Gilles Defraene is postdoctoral fellow of the Research Foundation Flanders (FWO, project 1292021 N). Karin Haustermans was a senior clinical investigator at the Research Foundation Flanders (FWO). John A. Lee is a Senior Research Associate with the F.R.S.-FNRS.

      Appendix A. Supplementary data

      The following are the Supplementary data to this article:

      References

        • Moore K.L.
        • Schmidt R.
        • Moiseenko V.
        • Olsen L.A.
        • Tan J.
        • Xiao Y.
        • et al.
        Quantifying unnecessary normal tissue complication risks due to suboptimal planning: A Secondary Study of RTOG 0126.
        Int J Radiat Oncol Biol Phys. 2015; 92: 228-235
        • Marcello M.
        • Ebert M.
        • Haworth A.
        • Steigler A.
        • Kennedy A.
        • Joseph D.
        • et al.
        Association between treatment planning and delivery factors and disease progression in prostate cancer radiotherapy: Results from the TROG 03.04 RADAR trial.
        Radiother Oncol. 2018; 126: 249-256
        • Craft D.
        • McQuaid D.
        • Wala J.
        • Chen W.
        • Salari E.
        • Bortfeld T.
        Multicriteria VMAT optimization.
        Med Phys. 2012; 39: 686-696https://doi.org/10.1118/1.3675601
        • Breedveld S.
        • Storchi P.R.M.
        • Voet P.W.J.
        • Heijmen B.J.M.
        iCycle: Integrated, multicriterial beam angle, and profile optimization for generation of coplanar and noncoplanar IMRT plans.
        Med Phys. 2012; 39: 951-963
        • Ge Y.
        • Wu Q.J.
        Knowledge-based planning for intensity-modulated radiation therapy: A review of data-driven approaches.
        Med Phys. 2019; 46: 2760-2775
        • Hussein M.
        • Heijmen B.J.M.
        • Verellen D.
        • Nisbet A.
        Automation in intensity modulated radiotherapy treatment planning – A review of recent innovations.
        British J Radiol. 2018; 91: 20180270https://doi.org/10.1259/bjr.20180270
        • Zarepisheh M.
        • Long T.
        • Li N.
        • Tian Z.
        • Romeijn H.E.
        • Jia X.
        • et al.
        A DVH-guided IMRT optimization algorithm for automatic treatment planning and adaptive radiotherapy replanning.
        Med Phys. 2014; 41061711
        • Fogliata A.
        • Nicolini G.
        • Bourgier C.
        • Clivio A.
        • De Rose F.
        • Fenoglietto P.
        • et al.
        Performance of a knowledge-based model for optimization of volumetric modulated Arc therapy plans for single and bilateral breast irradiation.
        PLoS One. 2015; 10e0145137
        • Fogliata A.
        • Reggiori G.
        • Stravato A.
        • Lobefalo F.
        • Franzese C.
        • Franceschini D.
        • et al.
        RapidPlan head and neck model: The objectives and possible clinical benefit.
        Radiat Oncol. 2017; 12: 73
        • Tol J.P.
        • Delaney A.R.
        • Dahele M.
        • Slotman B.J.
        • Verbakel W.F.A.R.
        Evaluation of a knowledge-based planning solution for head and neck cancer.
        Int J Radiat Oncol Biol Phys. 2015; 91: 612-620
        • Wu B.
        • Ricchetti F.
        • Sanguineti G.
        • Kazhdan M.
        • Simari P.
        • Jacques R.
        • et al.
        Data-driven approach to generating achievable dose-volume histogram objectives in intensity-modulated radiotherapy planning.
        Int J Radiat Oncol Biol Phys. 2011; 79: 1241-1247
        • Valdes G.
        • Simone 2nd, C.B.
        • Chen J.
        • Lin A.
        • Yom S.S.
        • Pattison A.J.
        • et al.
        Clinical decision support of radiotherapy treatment planning: A data-driven machine learning strategy for patient-specific dosimetric decision making.
        Radiother Oncol. 2017; 125: 392-397
        • Boldrini L.
        • Bibault J.-E.
        • Masciocchi C.
        • Shen Y.
        • Bittner M.-I.
        Deep learning: A review for the radiation oncologist.
        Front Oncol. 2019; 9: 977
        • Shen C.
        • Nguyen D.
        • Zhou Z.
        • Jiang S.B.
        • Dong B.
        • Jia X.
        An introduction to deep learning in medical physics: Advantages, potential, and challenges.
        Phys Med Biol. 2020;65:05TR01.;
        • Shao Y.
        • Zhang X.
        • Wu G.
        • Gu Q.
        • Wang J.
        • Ying Y.
        • et al.
        Prediction of three-dimensional radiotherapy optimal dose distributions for lung cancer patients with asymmetric network.
        IEEE J Biomed Health Inform. 2020; https://doi.org/10.1109/JBHI.2020.3025712
        • Zhou J.
        • Peng Z.
        • Song Y.
        • Chang Y.
        • Pei X.
        • Sheng L.
        • et al.
        A method of using deep learning to predict three-dimensional dose distributions for intensity-modulated radiotherapy of rectal cancer.
        J Appl Clin Med Phys. 2020; 21: 26-37
        • Barragán-Montero A.M.
        • Nguyen D.
        • Lu W.
        • Lin M.-H.
        • Norouzi-Kandalan R.
        • Geets X.
        • et al.
        Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
        Med Phys. 2019; 46: 3679-3691https://doi.org/10.1002/mp.13597
        • Fan J.
        • Wang J.
        • Chen Z.
        • Hu C.
        • Zhang Z.
        • Hu W.
        Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
        Med Phys. 2019; 46: 370-381
        • Nguyen D.
        • Jia X.
        • Sher D.
        • Lin M.-H.
        • Iqbal Z.
        • Liu H.
        • et al.
        3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
        Phys Med Biol. 2019; 64065020
      1. Nguyen D, Long T, Jia X, Lu W, Gu X, Iqbal Z, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning 2017.

        • Chen X.
        • Men K.
        • Li Y.
        • Yi J.
        • Dai J.
        A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning.
        Med Phys. 2019; 46: 56-64https://doi.org/10.1002/mp.13262
        • Kearney V.
        • Chan J.W.
        • Haaf S.
        • Descovich M.
        • Solberg T.D.
        DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks.
        Phys Med Biol. 2018; 63235022
        • Petersson K.
        • Nilsson P.
        • Engström P.
        • Knöös T.
        • Ceberg C.
        Evaluation of dual-arc VMAT radiotherapy treatment plans automatically generated via dose mimicking.
        Acta Oncol. 2016; 55: 523-525
        • McIntosh C.
        • Welch M.
        • McNiven A.
        • Jaffray D.A.
        • Purdie T.G.
        Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
        Phys Med Biol. 2017; 62: 5926-5944
        • Babier A.
        • Mahmood R.
        • McNiven A.L.
        • Diamant A.
        • Chan T.C.Y.
        Knowledge-based automated planning with three-dimensional generative adversarial networks.
        Med Phys. 2020; 47: 297-306
      2. Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks. 2018; vol. 85: 484-499
        • Babier A.
        • Boutilier J.J.
        • Sharpe M.B.
        • McNiven A.L.
        • Chan T.C.Y.
        Inverse optimization of objective function weights for treatment planning using clinical dose-volume histograms.
        Phys Med Biol. 2018; 63105004
        • Liu Z.
        • Fan J.
        • Li M.
        • Yan H.
        • Hu Z.
        • Huang P.
        • et al.
        A deep learning method for prediction of three-dimensional dose distribution of helical tomotherapy.
        Med Phys. 2019; 46: 1972-1983
        • Song Y.
        • Hu J.
        • Liu Y.
        • Hu H.
        • Huang Y.
        • Bai S.
        • et al.
        Dose prediction using a deep neural network for accelerated planning of rectal cancer radiotherapy.
        Radiother Oncol. 2020; 149: 111-116
        • Murakami Y.
        • Magome T.
        • Matsumoto K.
        • Sato T.
        • Yoshioka Y.
        • Oguchi M.
        Fully automated dose prediction using generative adversarial networks in prostate cancer patients.
        PLoS One. 2020; 15e0232697
        • Kandalan R.N.
        • Nguyen D.
        • Rezaeian N.H.
        • Barragán-Montero A.M.
        • Breedveld S.
        • Namuduri K.
        • et al.
        Dose prediction with deep learning for prostate cancer radiation therapy: Model adaptation to different treatment planning practices.
        Radiother Oncol. 2020; 153: 228-235
        • Kearney V.
        • Chan J.W.
        • Wang T.
        • Perry A.
        • Descovich M.
        • Morin O.
        • et al.
        DoseGAN: a generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation.
        Sci Rep. 2020; 10: 11073
        • Ma M.
        • Kovalchuk N.
        • Buyyounouski M.K.
        • Xing L.
        • Yang Y.
        Incorporating dosimetric features into the prediction of 3D VMAT dose distributions using deep convolutional neural network.
        Phys Med Biol. 2019; 64125017
        • Kajikawa T.
        • Kadoya N.
        • Ito K.
        • Takayama Y.
        • Chiba T.
        • Tomori S.
        • et al.
        A convolutional neural network approach for IMRT dose distribution prediction in prostate cancer patients.
        J Radiat Res. 2019; 60: 685-693
        • Guerreiro F.
        • Seravalli E.
        • Janssens G.O.
        • Maduro J.H.
        • Knopf A.C.
        • Langendijk J.A.
        • et al.
        Deep learning prediction of proton and photon dose distributions for paediatric abdominal tumours.
        Radiother Oncol. 2020; 156: 36-42
      3. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing; 2015, p. 234–41.

        • Goodfellow I.J.
        • Pouget-Abadie J.
        • Mirza M.
        • Xu B.
        • Warde-Farley D.
        • Ozair S.
        • et al.
        Generative Adversarial Networks.
        arXiv [statML]. 2014;
        • Zhang J.
        • Liu S.
        • Li T.
        • Mao R.
        • Du C.
        • Liu J.
        Voxel-level radiotherapy dose prediction using densely connected network with dilated convolutions.
        Art Intell Radiat Therapy. 2019; : 70-77https://doi.org/10.1007/978-3-030-32486-5_9
        • Nguyen D.
        • McBeth R.
        • Sadeghnejad Barkousaraie A.
        • Bohara G.
        • Shen C.
        • Jia X.
        • et al.
        Incorporating human and learned domain knowledge into training deep neural networks: A differentiable dose-volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy.
        Med Phys. 2020; 47: 837-849
        • Sun C.
        • Shrivastava A.
        • Singh S.
        • Gupta A.
        Revisiting Unreasonable Effectiveness of Data in Deep Learning Era.
        in: 2017 IEEE International Conference on Computer Vision (ICCV). 2017https://doi.org/10.1109/iccv.2017.97
        • Willemink M.J.
        • Koszek W.A.
        • Hardell C.
        • Wu J.
        • Fleischmann D.
        • Harvey H.
        • et al.
        Preparing Medical Imaging Data for Machine Learning.
        Radiology. 2020; 295: 4-15
        • Jabbour S.K.
        • Hashem S.A.
        • Bosch W.
        • Kim T.K.
        • Finkelstein S.E.
        • Anderson B.M.
        • et al.
        Upper abdominal normal organ contouring guidelines and atlas: a Radiation Therapy Oncology Group consensus.
        Pract Radiat Oncol. 2014; 4: 82-89
        • Feng M.
        • Moran J.M.
        • Koelling T.
        • Chughtai A.
        • Chan J.L.
        • Freedman L.
        • et al.
        Development and validation of a heart atlas to study cardiac exposure to radiation following treatment for breast cancer.
        Int J Radiat Oncol Biol Phys. 2011; 79: 10-18
      4. Kong FM, Quint L, Machtay M, Bradley J. Atlas for organs at risk (OARS) in thoracic radiation therapy 2013.

      5. Ajani JA, D’Amico TA, Bentrem DJ, Chao J, Corvera C, Das P, et al. Esophageal and Esophagogastric Junction Cancers, Version 2.2019, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2019;17:855–83.

        • Beukema J.C.
        • Kawaguchi Y.
        • Sijtsema N.M.
        • Zhai T.-T.
        • Langendijk J.A.
        • van Dijk L.V.
        • et al.
        Can we safely reduce the radiation dose to the heart while compromising the dose to the lungs in oesophageal cancer patients?.
        Radiother Oncol. 2020; 149: 222-227
        • Thomas M.
        • Defraene G.
        • Lambrecht M.
        • Deng W.
        • Moons J.
        • Nafteux P.
        • et al.
        NTCP model for postoperative complications and one-year mortality after trimodality treatment in oesophageal cancer.
        Radiother Oncol. 2019; 141: 33-40
        • Lee H.K.
        • Vaporciyan A.A.
        • Cox J.D.
        • Tucker S.L.
        • Putnam J.B.
        • Ajani J.A.
        • et al.
        Postoperative pulmonary complications after preoperative chemoradiation for esophageal carcinoma: correlation with pulmonary dose–volume histogram parameters.
        Int J Radiat Oncol Biol Phys. 2003; 57: 1317-1322https://doi.org/10.1016/s0360-3016(03)01373-7
        • Wang S.-L.
        • Liao Z.
        • Vaporciyan A.A.
        • Tucker S.L.
        • Liu H.
        • Wei X.
        • et al.
        Investigation of clinical and dosimetric factors associated with postoperative pulmonary complications in esophageal cancer patients treated with concurrent chemoradiotherapy followed by surgery.
        Int J Radiat Oncol Biol Phys. 2006; 64: 692-699
        • Tucker S.L.
        • Liu H.H.
        • Wang S.
        • Wei X.
        • Liao Z.
        • Komaki R.
        • et al.
        Dose-volume modeling of the risk of postoperative pulmonary complications among esophageal cancer patients treated with concurrent chemoradiotherapy followed by surgery.
        Int J Radiat Oncol Biol Phys. 2006; 66: 754-761
        • Liu L.
        • Cheng J.
        • Quan Q.
        • Wu F.-X.
        • Wang Y.-P.
        • Wang J.
        A survey on U-shaped networks in medical image segmentations.
        Neurocomputing. 2020; 409: 244-258https://doi.org/10.1016/j.neucom.2020.05.070
        • Wu C.
        • Nguyen D.
        • Xing Y.
        • Barragan A.
        • Schuemann J.
        • Shang H.
        • et al.
        Improving proton dose calculation accuracy by using deep learning.
        Mach Learn: Sci Technol. 2020; https://doi.org/10.1088/2632-2153/abb6d5
        • Xing Y.
        • Zhang Y.
        • Nguyen D.
        • Lin M.-H.
        • Lu W.
        • Jiang S.
        Boosting radiotherapy dose calculation accuracy with deep learning.
        J Appl Clin Med Phys. 2020; https://doi.org/10.1002/acm2.12937
      6. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. https://doi.org/10.1109/cvpr.2017.243.

        • Mayo C.S.
        • Kessler M.L.
        • Eisbruch A.
        • Weyburne G.
        • Feng M.
        • Hayman J.A.
        • et al.
        The big data effort in radiation oncology: Data mining or data farming?.
        Adv Radiat Oncol. 2016; 1: 260-271https://doi.org/10.1016/j.adro.2016.10.001
        • Zhang J.
        • Liu S.
        • Yan H.
        • Li T.
        • Mao R.
        • Liu J.
        Predicting voxel-level dose distributions for esophageal radiotherapy using densely connected network with dilated convolutions.
        Phys Med Biol. 2020; 65205013
        • Blanch M.G.
        Active Deep Learning for Medical Imaging Segmentation. 2017;
        • Babier A.
        • Mahmood R.
        • McNiven A.L.
        • Diamant A.
        • Chan T.C.Y.
        The importance of evaluating the complete automated knowledge-based planning pipeline.
        Phys Med. 2020; 72: 73-79
        • Wang W.
        • Sheng Y.
        • Wang C.
        • Zhang J.
        • Li X.
        • Palta M.
        • et al.
        Fluence Map Prediction Using Deep Learning Models – Direct Plan Generation for Pancreas Stereotactic Body Radiation Therapy. Front.
        Artif Intell. 2020;3.; https://doi.org/10.3389/frai.2020.00068
        • Sheng Y.
        • Li T.
        • Yoo S.
        • Yin F.-F.
        • Blitzblau R.
        • Horton J.K.
        • et al.
        Automatic planning of whole breast radiation therapy using machine learning models.
        Front Oncol. 2019;9.; https://doi.org/10.3389/fonc.2019.00750
        • Ma L.
        • Chen M.
        • Gu X.
        • Lu W.
        Deep learning-based inverse mapping for fluence map prediction.
        Phys Med Biol. 2020; https://doi.org/10.1088/1361-6560/abc12c
        • Li X.
        • Zhang J.
        • Sheng Y.
        • Chang Y.
        • Yin F.-F.
        • Ge Y.
        • et al.
        Automatic IMRT planning via static field fluence prediction (AIP-SFFP): a deep learning algorithm for real-time prostate treatment planning.
        Phys Med Biol. 2020; 65175014
        • Lee H.
        • Kim H.
        • Kwak J.
        • Kim Y.S.
        • Lee S.W.
        • Cho S.
        • et al.
        Fluence-map generation for prostate intensity-modulated radiotherapy planning using a deep-neural-network.
        Sci Rep. 2019; 9: 15671