If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Machine learning models predicted dose distributions for left-sided breast cancer.
Simple U-net based network can produce clinically acceptable dose distributions.
The 6-layer 3D neural network outperformed the 7-layer 2D neural network.
Predictions consistently had lower V20 of the left lung than in the clinical plans.
Mean dose difference was always within 0.02% of dose prescription for both models.
Purpose: To develop a deep learning model capable of producing clinically acceptable dose distributions for left-sided breast cancers for 3D-CRT while exploring the use of two-dimensional versus three-dimensional anatomical data.
Methods: Two deep learning models, a two-dimensional and three-dimensional model, based on U-net architecture were trained to predict dose distribution given anatomical information and dose prescription. The input consists of 6 channels including the patient CT along with binary masks for four OARs and one covering the volume receiving 95% dose (based on the clinical plan). A training set of 120 patients was compiled and used with 5-fold cross validation. The best performing model from the 5 folds was analyzed with a test set of 25 patients using cumulative DVH, mean differences in mean dose to OARs represented by box plots, and V20 of the left lung.
Results: We have shown that both models are capable of producing clinically acceptable dose distributions, with the 3D outperforming the 2D model. The average dose difference for mean dose is within 0.02% of the dose prescription for both models. The V20 from the predicted dose distributions are comparable with the V20 from clinical plans, where predictions tend to be slightly under.
Conclusions: Based on the results, the models could be implemented clinically to produce dose distributions that can be used as a reference to ensure the most ideal plan is used. Each prediction is patient-specific while requiring minimal time and information creating a new standard in plan quality without hindering the planning process.
]. While the technology used to perform these treatments has advanced, treatment planning has moved forward at a slower rate as plans have become more complex while relying heavily on the skills and decisions of individual planners [
]. These plans center around producing an ideal dose distribution maximizing dose to the planning treatment volume (PTV) while minimizing dose to the organs-at-risk (OARs), but DVH-based planning lacks spatial information in the cumulative DVH diagrams requiring production of dose distributions increasing the amount of intervention necessary and time to produce plans [
]. This shows promise for the use of AI in treatment planning, but many of these papers use either only a two-dimensional or three-dimensional models.
In this work, we develop and compare a two-dimensional and three-dimensional deep learning model for dose distribution prediction of left-sided breast cancers based on the U-net architecture for efficient and precise results. The model is trained on patient anatomy and dose prescription for patients with varying beam configuration and beam energies based on three-dimensional conformal radiation therapy (3D-CRT) plans. It is then analyzed using cumulative dose-volume histograms, mean differences in mean dose to each region of interest weighted by the dose prescription, and V20. In producing a model capable of predicting clinically acceptable distributions, we are able to provide physicians and planners with a patient-specific reference point of an ideal dose distribution to aim for when developing plans. The dose distributions are quickly produced with minimal information ensuring that the most optimal plans are used without compromising efficiency in the planning process and creating a new standard in the quality of plans.
2.1 Patient data and preprocessing
A total of 145 left-sided breast cancer patients were retrospectively selected for this study, all treated using 3D-CRT (forward planning) from 2017 to 2020 with 2 or 4 tangential beams having energies of 6, 10, and/or 18 MV. Each patient had a prescribed dose of 42.56 Gy in 16 fractions. Patient CT images, contoured structures, and their clinically delivered dose distributions were retrieved from the Pinnacle treatment planning system as Digital Imaging and Communications in Medicine (DICOM) files. This data was processed using a combination of the software 3D Slicer [
]. These contoured structures include left lung, contralateral lung, heart, spinal canal, and the area covered by the 95% isodose curve (Fig. 1). Majority of structures were contoured by trained treatment planners with some of the contralateral lung and spinal canal structures done through auto-segmentation and subsequently reviewed by the first author. The structure containing the 95% dose area was used to represent the planning treatment volume (PTV) due to limitations of the clinical data available.
For the two-dimensional model, each patient volume was resampled to pixels through linear interpolation and separated into 128 slices. This results in a total 18,560 slices for training, validation, and testing. The contoured organs-at-risk (OAR) were separated into individual binary masks with a value of 1 indicating a pixel containing the corresponding structure and 0 indicating a pixel containing background. The 95% dose area was also set as a binary mask, but used the dose prescription rather than a value of 1. The CT images were normalized with min–max normalization to have values ranging 0–1 to ensure they are on a similar scale to the masks such that they provide equal contributions to the model avoiding bias. The CT images were given as input to the model with each binary mask included as its own channel for a total of 6 channels. Data augmentation was used to prevent overfitting where approximately 30% of samples were randomly flipped along the vertical axis. The output of the model is the slice-by-slice dose distribution prediction as the model learns to interpret the exact dose values as the pixel values.
The data for the three-dimensional model underwent identical processing. Randomized sub-volumes of pixels containing the CT images and binary masks were then given as input to the model which allows for inherent data augmentation through translation of samples. A voxel-wise dose distribution prediction is output from the model.
2.2 Model architecture
The two-dimensional model consists of a 7-level U-net [
] architecture with an additional three CNN layers at the end as seen in Fig. 2. This architecture allows for fast training over the images without sacrificing precision through the use of several skip connections. The additional CNN layers allow for a smoother decrease in the number of filters from 24 to 1 with improved accuracy. The seven levels were chosen maximizing computational use and reducing the input data from to with a convolution kernel size of and max pooling and transposed convolution kernel size of . The model was tested with various rates of dropout and it was found that it performed best with a maximum dropout rate of 0.5 at the bottom decreasing by 0.1 for every level up. The dropout is applied after the max pooling and transposed convolutional layers. Mean squared error (MSE) was chosen as the loss function, which is the average squared difference between the predicted value, , and the actual value, , for a total N data points as seen in the equation: . Adam [
], short for adaptive moment estimation, was used as the optimizer to update parameters with an initial learning rate of 0.0001. Rectified linear unit (ReLU) was used as the activation function following each convolution.
The three-dimensional model contains similar structure with a 6-level U-net architecture and an additional three CNN layers at the end shown by Fig. 3. A convolution kernel size of was used with max pooling and transposed convolution kernel size of . The models were built using the python library Keras [
] as the backend. Both models were trained on dual NVIDIA Geforce RTX 2080 Ti GPUs with 11 GB RAM each using data parallelism for the two-dimensional model and model parallelism for the three-dimensional model.
2.3 Training and evaluation
A set of 25 patients were put aside for testing leaving 120 patients for training and validation. To get use of the full dataset, we used 5-fold cross validation for analysis which splits the training data into 5 groups of equal size. We then train the model on the first 4 groups while using the last to perform validation, which is known as fold 1 producing the first model. Fold 2 then consists of using groups 1 and 3–5 to train while using group 2 for validation producing a second model. This process repeats until the model has been trained and validated over all data allowing for usage of the entire dataset while preventing the model from being tested on previously seen data. A batch-size of 32 slices was chosen as input for the two-dimensional model with a batch-size of 2 sub-volumes for the three-dimensional model. The overall performance of the chosen architecture is then given by an average of the training and validation loss for all 5 models. The model with the lowest mean squared error on both validation and test data, and thus the best generalized model, was then used to produce dose predictions which were used for visualization and evaluated using box plots to represent the mean differences in mean dose for each region of interest weighted by the dose prescription as seen in the equation: , and box plots comparing V20 of the dose predictions from both models with the V20 from clinically used plans where V20 is the relative volume of the left lung receiving .
The average mean squared error over all 5 folds was graphed as a function of epochs for both training and validation in Fig. 4 where the 2D model was run for 100 epochs and the 3D model was run for 200 epochs.
Predictions were produced for all patients in the test set where to visualize the performance of the models, the dose distributions of two randomly selected patients are compared here. Fig. 5 displays the comparison of dose distributions weighted by the prescription dose for patient A while Fig. 6 shows a comparison of the cumulative DVH for the 2D and 3D model against the clinically predicted DVH. The results for patient B can be found in the appendix (Fig. A1, Fig. A2).
The V20 for the 25 test patients is represented as a boxplot in Fig. 7 where we compare the distribution from the 2D predictions, 3D predictions, and the clinically used plans. The models tend to have lower predicted V20 than the clinically used plans, where the 3D model has a mean of 6.00 and the 2D model has a mean of 6.51 .
The dose difference for mean dose to each region of interest weighted by the dose prescription for the 25 test patients is represented as a boxplot in Fig. 8. It can be seen that both models tend to predict the mean dose for all regions within 0.05% of the prescription dose excluding outliers. The average dose differences in mean dose with standard deviation for the 2D model and the 3D model can be found in the appendix as Table A1 where paired sample t-tests were performed. The 2D and 3D model perform similarly in predictions on the left lung and spinal canal, with the 3D model performing slightly better on the right lung and heart by 0.001% .
We have developed two models which successfully can predict the dose distribution for various left-sided breast cancer patients. Overall, the 3D model exceeded the performance of the 2D model which suggests a significance in the spatial information between slices for improving model quality. The loss functions for both models reach a minimum mean squared error around 2 which occurs after about 40 epochs for the 2D model where it then plateaus, while the 3D model reaches the minimum around 180 epochs. This could be due to increased data augmentation in the 3D data as each sub-volume is randomly generated decreasing the number of times the model sees the same portion of the patient volume, and thus requires longer training whereas for the 2D data, the model always receives a full slice with approximately 30% randomly augmented so the data is limited to the number of slices.
The distribution of predicted V20 values for the models are very close to that of the V20 from clinically used plans. The model predictions tend to be slightly below the true values, which while they are less accurate compared to those used, could allow for more optimal plans produced with lower doses to the left lung. In comparing the mean V20 over the 25 patients, the 2D model produces predictions closer to those used whereas the 3D model produces predictions with more optimal doses (lower V20).
The average dose difference of mean dose for the 2D model weighted by the dose prescription are also very close to that of the 3D model (Table A1). In terms of mean dose which allows consideration of all pixels or voxels contained within a structure, both models perform very well with the 3D model only slightly better. The results here are also very stable which shows great promise for future clinical implementation of one of the models. In Nguyen et al. (2019), a U-net model is used to predict dose distributions for prostate cancer and the lowest mean dose difference weighted by dose prescription is 0.48% [
]. All values of mean dose difference weighted by dose prescription for our models were less than 0.06% exhibiting the high-level of accuracy possible with the given parameters and model architecture.
The 2D model had more trainable parameters with a deeper architecture network, and a larger batch size than the 3D model which should greatly improve the performance of the 2D model, yet we see that the 3D model has better results. This suggests that, to some extent, the type of data carries more importance than the depth of the network. Although the 3D model had greater variation, we believe larger batch sizes with longer training could help stabilize the results, however, this would be much more computationally expensive.
With these models, we can quickly produce optimal dose distributions with limited data to be used as a reference for physicians during planning. Due to consistency in performance of the model, using these references would create a new standard of plan quality ensuring consistency in the plans. We can also work towards automated treatment planning to decrease planning times while ensuring patients are always receiving high-quality care. These dose distributions can be used directly with inverse planning to produce clinically acceptable plans. We predict that the models could also, if provided adequate data, be expanded to predict dose distributions for various other cancer sites with similar beam geometry without a loss in performance expanding both the capability and applicability of the models.
We have demonstrated the effectiveness and importance of implementing machine learning in treatment planning by developing two deep-learning models to predict dose distributions. In exploring the performance of a two-dimensional and three-dimensional model, we have shown that the spatial information given in 3D data can greatly enhance accuracy with the 3D model outperforming the 2D model. Implementation of this model in a clinical setting has great potential to improve plan quality by providing an optimal distribution for reference and in the future, could be used to reduce treatment planning times while maintaining high-quality of plans.
One of the authors N.H. was funded by a Harold E. Johns Studentship provided through Cancer Care Ontario (Ontario Health) and by the Walker Family Cancer Centre.
Appendix A. Appendix
Table A1Average mean dose difference (%) and standard deviation over 25 test patients for regions of interest in the two-dimensional and three-dimensional models with the corresponding p-values from paired-samples t test.