Original paper| Volume 96, P62-69, April 01, 2022

# Improved healthy tissue sparing in proton therapy of lung tumors using statistically sound robust optimization and evaluation

Published:February 26, 2022

## Highlights

• Worst-case robust planning does not provide quantified confidence levels.
• Uncertainties were explicitly integrated during optimization/evaluation.
• Target robustness on the edge of clinical acceptability was obtained.
• Significant organ at risk sparing was achieved alongside adequate target robustness.

## Abstract

### Introduction

Robust planning is essential in proton therapy for ensuring adequate treatment delivery in the presence of uncertainties. For both robust optimization and evaluation, commonly-used techniques can be overly conservative in selecting error scenarios and lack in providing quantified confidence levels. In this study, established techniques are compared to comprehensive alternatives to assess the differences in target coverage and organ at risk (OAR) dose.

### Method

Thirteen lung cancer patients were planned. Two robust optimization methods were used: scenario selection from marginal probabilities (SSMP) based on using maximum setup and range error values and scenario selection from joint probabilities (SSJP) that selects errors on a predefined 90% hypersurface. Two robust evaluation methods were used: conventional evaluation (CE) based on generating error scenarios from combinations of maximum errors of each uncertainty source and statistical evaluation (SE) via the Monte Carlo dose engine MCsquare which considers scenario probabilities.

### Results

Plans optimized using SSJP had, on average, 0.5 Gy lower dose in CTV D98(worst-case) than SSMP-optimized plans. When evaluated using SE, 92.3% of patients passed our clinical threshold in both optimization methods. Average gains in OAR sparing were recorded when transitioning from SSMP to SSJP: esophagus (0.6 Gy D2(nominal), 0.9 Gy D2(worst-case)), spinal cord (3.9 Gy D2(nominal), 4.1 Gy D2(worst-case)) heart (1.1 Gy Dmean, 1.9% V30), lungs-GTV (1.0 Gy Dmean , 1.9% V30).

### Conclusion

Optimization using SSJP yielded significant OAR sparing in all recorded metrics with a target robustness within our clinical objectives, provided that a more statistically sound robustness evaluation method was used.

## 1. Introduction

Proton therapy aims at accurately delivering curative radiation doses to tumors while reducing exposure to surrounding healthy tissue. Protons display a steep dose fall-off at the end of their range (the so-called “Bragg peak”) resulting in a sharply localized dose peak. The high dose gradients, however, lead to a higher susceptibility to treatment uncertainties. Notable sources of uncertainty include, among others, setup errors, as well as range errors (stemming from the conversion of the CT Hounsfield units – HUs – to physical quantities – stopping powers). Inter- and intra- fraction motion also needs to be considered, particularly for tumors of the thorax due to breathing motion that can induce an undesirable shift or distortion of the dose distribution as a result of displaced density heterogeneities [
• Chang J.Y.
• Zhang X.
• Knopf A.
• Li H.
• Mori S.
• Dong L.
• et al.
Consensus Guidelines for Implementing Pencil-Beam Scanning Proton Therapy for Thoracic Malignancies on Behalf of the PTCOG Thoracic and Lymphoma Subcommittee.
,
• Paganetti H.
Range uncertainties in proton therapy and the role of Monte Carlo simulations.
,
• Brousmiche S.
• Souris K.
• De Xivry J.O.
• Lee J.A.
• Macq B.
• Seco J.
Combined influence of CT random noise and HU-RSP calibration curve nonlinearities on proton range systematic errors.
,
• Kraus K.M.
• Heath E.
• Oelfke U.
Dosimetric consequences of tumour motion due to respiration for a scanned proton beam.
,
• Park P.C.
• Cheung J.P.
• Zhu X.R.
• Lee A.K.
• Sahoo N.
• Tucker S.L.
• et al.
Statistical assessment of proton treatment plans under setup and range uncertainties.
]. Given this, taking uncertainties into account in the planning stage is of paramount importance. This can be achieved via robust optimization which directly incorporates treatment errors in the optimization process [
• Liu W.
• Zhang X.
• Li Y.
• Mohan R.
Robust optimization of intensity modulated proton therapy.
,
• Pflugfelder D.
• Wilkens J.J.
• Oelfke U.
Worst case optimization: A method to account for uncertainties in the optimization of intensity modulated proton therapy.
,
• Fredriksson A.
• Forsgren A.
• Hårdemark B.
Minimax optimization for handling range and setup uncertainties in proton therapy.
,
• Bangert M.
• Hennig P.
• Oelfke U.
Analytical probabilistic modeling for radiation therapy treatment planning.
].
One of the most widely used robust optimization methods, known as ‘worst-case’ robust optimization, aims at achieving adequate target coverage by using combinations of treatment errors to generate scenarios [
• Fredriksson A.
• Forsgren A.
• Hårdemark B.
Minimax optimization for handling range and setup uncertainties in proton therapy.
,
• Unkelbach J.
• Paganetti H.
Robust Proton Treatment Planning: Physical and Biological Optimization.
]. In popular implementations of worst-case robust optimization, such as Fredriksson et al.’s “minimax” optimization [
• Fredriksson A.
• Forsgren A.
• Hårdemark B.
Minimax optimization for handling range and setup uncertainties in proton therapy.
], scenarios are evaluated after each iteration during the optimization process ensuring that the objective function of the current worst-case scenario is minimized. In typical clinical implementations of the worst-case robust optimization workflow, several issues can be identified. Firstly, overly conservative scenarios are being pre-selected due to the scenarios being composed of maximum errors of each uncertainty source [
• Fredriksson A.
• Forsgren A.
• Hårdemark B.
Minimax optimization for handling range and setup uncertainties in proton therapy.
]. For a lung tumor case, the following are commonly used: ±5 mm setup error in each direction, ± 3% image conversion error and three breathing phases including the maximum inhale and exhale [
• Cummings D.
• Tang S.
• Ichter W.
• Wang P.
• Sturgeon J.D.
• Lee A.K.
• et al.
Four-dimensional Plan Optimization for the Treatment of Lung Tumors Using Pencil-beam Scanning Proton Radiotherapy.
,
• Inoue T.
• Widder J.
• van Dijk L.V.
• Takegawa H.
• Koizumi M.
• Takashina M.
• et al.
Limited Impact of Setup and Range Uncertainties, Breathing Motion, and Interplay Effects in Robustly Optimized Intensity Modulated Proton Therapy for Stage III Non-small Cell Lung Cancer.
]. As mentioned by Korevaar et al. [
• Korevaar E.W.
• Habraken S.J.M.
• Scandurra D.
• Kierkels R.G.J.
• Unipan M.
• Eenink M.G.C.
• et al.
Practical robustness evaluation in radiotherapy – A photon and proton-proof alternative to PTV-based plan evaluation.
] and by Sterpin et al. [
• Sterpin E.
• Rivas S.T.
• Van den Heuvel F.
• George B.
• Lee J.A.
• Souris K.
Development of robustness evaluation strategies for enabling statistically consistent reporting.
] this amounts to extremes of marginal probability distributions being combined instead of sampling the joint probability distributions. Secondly, a lack of consistently calculated confidence levels leads to the concept of a ‘worst-case’ becoming hard to define [
• Sterpin E.
• Rivas S.T.
• Van den Heuvel F.
• George B.
• Lee J.A.
• Souris K.
Development of robustness evaluation strategies for enabling statistically consistent reporting.
]. Said limitations are typically met both in robust optimization and evaluation.
While these marginal approaches are widely used in clinical practice, alternative approaches exist that aim to improve upon it. For robust optimization, Buti et al. developed a method of preselecting a set of treatment error scenarios by considering the systematic setup and range uncertainties’ joint probabilities [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
]. Korevaar et al. [
• Korevaar E.W.
• Habraken S.J.M.
• Scandurra D.
• Kierkels R.G.J.
• Unipan M.
• Eenink M.G.C.
• et al.
Practical robustness evaluation in radiotherapy – A photon and proton-proof alternative to PTV-based plan evaluation.
] performed robust evaluation using a statistically consistent but limited set of scenarios. This scenario-based robust evaluation approach yielded clinically viable proton treatment plans that were consistent with PTV evaluations and addressed the limitation of the dose cloud approximation by explicitly calculating treatment scenario doses that describe potential treatment errors. Robustness evaluation with MCsquare, a Monte Carlo dose engine developed by Souris et al. [
• Sterpin E.
• Rivas S.T.
• Van den Heuvel F.
• George B.
• Lee J.A.
• Souris K.
Development of robustness evaluation strategies for enabling statistically consistent reporting.
,
• Souris K.
• Lee J.A.
• Sterpin E.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.
], enables exploring the dosimetric error space in a more statistically consistent manner at a 90% confidence level. Another dose engine-based approach for performing a comprehensive evaluation of a plan’s robustness is the polynomial chaos expansion method as described by Perkó et al. [
• Perkó Z.
• van der Voort S.R.
• van de Water S.
• Hartman C.M.H.
• Hoogeman M.
• Lathouwers D.
Fast and accurate sensitivity analysis of IMPT treatment plans using Polynomial Chaos Expansion.
] which can accurately estimate the dose, its variance and distribution in any particular error scenario. This addresses the need for computing confidence levels based on estimating the treatment error magnitude in order to achieve a given clinical goal in a statistically relevant manner (e.g., CTV coverage in a percentage of patients). To address the loss of robustness in lung cases with significant motion, Taasti et al. [
• Taasti V.T.
• Hattu D.
• Vaassen F.
• Canters R.
• Velders M.
• Mannens J.
• et al.
Treatment planning and 4D robust evaluation strategy for proton therapy of lung tumors with large motion amplitude.
] proposed a joint treatment planning and robust evaluation approach based on generating an internal target volume (ITV) achieving clinically viable plans.
In this publication, we would like to explore whether a clinical benefit can be expected using scenario selection tools with improved statistical foundations, both at the level of robust optimization and evaluation. A workflow including worst-case robust optimization and evaluation via RayStation as performed conventionally in clinical practice is compared to two other tools: a tool that enables scenario selection from joint probabilities developed by Buti et al. [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
] and MCsquare [
• Souris K.
• Lee J.A.
• Sterpin E.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.
]. We have chosen lung tumors here, because of the challenges this location entails with respect to robust planning. By applying those methods on realistic clinical cases, we aim at evaluating their impact on target coverage and organs-at-risk sparing.

## 2. Material and methods

### 2.1. Patient data

The planning database contained 13 lung cancer patients. Patient data consisted of a 4D-CT image set containing ten, evenly spaced in time, breathing phases. A 60 Gy dose prescription over 30 fractions was used with a CTV coverage goal of delivering at least 95% of the prescribed dose (=57 Gy) to 98% of the target volume. Constraints were placed on the organs-at-risk (OARs) on a case-by-case basis, depending on tumor and lymph node size and positioning and proximity to OARs. In all cases, priority was given to maintaining target coverage while remaining below the OAR constraints in the Appendix (Appendix A,
• De Ruysscher D.
• Faivre-Finn C.
• Moeller D.
• Nestle U.
• Hurkmans C.W.
• Le Péchoux C.
• et al.
European Organization for Research and Treatment of Cancer (EORTC) recommendations for planning and delivery of high-dose, high precision radiotherapy for lung cancer.
). Only the target was robustly optimized, while serial organs, such as the spinal cord, were not in an effort to reduce optimization time and rely on more accurate robust evaluation data instead. The CTV size, tumor location relative to the lung and motion amplitude for all patients are given in the Appendix (Table S4).
All treatment plans used the mid-position CT (MidP-CT) as the nominal planning CT [
• Wanet M.
• Sterpin E.
• Janssens G.
• Delor A.
• Lee J.A.
• Geets X.
Validation of the mid-position strategy for lung tumors in helical TomoTherapy.
]. The MidP-CT was reconstructed from the entire 4D CT data set via deformable registration and represents all anatomy, including the tumor, in the time-averaged position over the respiratory cycle. For robust optimization, three additional breathing phases were used: the maximum exhale CT (End_ExH), the maximum inhale CT (End_InH) and the mid ventilation CT (MidV) [
• Borderías Villarroel E.
• Geets X.
• Sterpin E.
Online adaptive dose restoration in intensity modulated proton therapy of lung cancer to account for inter-fractional density changes.
]. The MidV-CT corresponds to the breathing phase in which the tumor’s position is closest to its time-averaged position over the entire respiratory cycle. Motion state specific CTVs were morphed on every breathing phase by means of deformable registration and further edited by doctors. Only four breathing phases were used for optimization to both reduce computation time and because each optimization stage was followed by a comprehensive robustness evaluation using all ten breathing phases which ensured adequate coverage.

### 2.2. Robust optimization

Robust optimization was performed on RayStation 9B on a computer with the following specifications: Intel Xeon Gold 6234 CPU (two 3.30 GHz processors), 128 GB RAM, NVIDIA Quadro RTX 6000 GPU, 2 TB SSD.
Two worst-case scenario selection methods were compared: conventional scenario selection from marginal probabilities (SSMP) and scenario selection from joint probabilities (SSJP), a method of preselecting a limited set of treatment error scenarios developed by Buti et al. [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
]. Scenarios cover geometric uncertainties (setup errors, range errors, motion); interplay was not considered for this study.

#### 2.2.1. SSMP

In SSMP, maximum setup errors calculated using systematic (Σ) and random (σ) setup and baseline shift values were used for two cases: tumor only and tumor with lymph nodes. This was done using van Herk et al.’s margin formula [
• van Herk M.
• Remeijer P.
• Rasch C.
• Lebesque J.V.
The probability of correct target dosage: Dose-population histograms for deriving treatment margins in radiotherapy.
] with the goal of obtaining a margin, M, that ensures a minimum dose is delivered to 90% of the patient population:
$M=2.5∙Σtotal+0.7∙σtotal$
(1)

The limitations of such a simplified approach for determining margins in proton therapy must be acknowledged. Strictly speaking, simple margin recipes cannot be derived in a sound statistical manner in proton therapy because of the failure of the static dose cloud approximation [
• Sterpin E.
• Rivas S.T.
• Van den Heuvel F.
• George B.
• Lee J.A.
• Souris K.
Development of robustness evaluation strategies for enabling statistically consistent reporting.
]. However, as observed by Korevaar et al. [
• Korevaar E.W.
• Habraken S.J.M.
• Scandurra D.
• Kierkels R.G.J.
• Unipan M.
• Eenink M.G.C.
• et al.
Practical robustness evaluation in radiotherapy – A photon and proton-proof alternative to PTV-based plan evaluation.
] when attempting to provide a practical, PTV-less approach for proton treatment planning, applying van Herk et al.’s formula (1) when converting uncertainties into errors in robust optimization is a limited yet suitable approach in most cases.
The calculated total setup error values for each direction were put in our treatment planning system (TPS), RayStation. However, using a margin that exceeds 5 mm in RayStation leads to a significant increase in optimization time due to the system automatically generating intermediate errors to ensure robust coverage, which can be an overconservative approach. As a workaround, instead of optimizing on the CTV, a patient-specific CTV expansion can be generated. From the full value of the margin, 5 mm is subtracted (in each direction) and subsequently used as an isotropic setup error value. The CTV is then expanded by the remaining amount. As an example, for setup margins of 7, 8, 9 mm (x, y, z directions), the CTV would be expanded by 2, 3, 4 mm and optimization would be done on the expanded CTV volume using an isotropic setup error of 5 mm, meaning the total setup errors in all directions would equal the initial values. The complete setup margin values can be found in the Appendix (Tables S1 and S2). Optimization was done on the expanded CTV volume to reduce computation time, while evaluation was done on the CTV using the total uncertainty values.
The standard deviation of the range error resulting from the conversion of the CT Hounsfield units (HUs) to relative stopping power was set equal to 2.6% for optimization and evaluation as per reviewing Paganetti et al. [
• Paganetti H.
Range uncertainties in proton therapy and the role of Monte Carlo simulations.
]. The total number of optimization scenarios is 84: 7 (setup errors: ± 5 mm in ×, y, z directions, additionally the nominal scenario) × 3 (image conversion errors: ±3%, 0%) × 4 (breathing phases: MidP as nominal, MidV, maximum inhale and maximum exhale).

#### 2.2.2. SSJP

The SSJP method aims to address some of the statistical inconsistencies of the conventional approach. Details on the SSJP method can be found in Buti (2019) [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
]. In short, by considering the systematic setup and range uncertainties’ joint probabilities, a 90% 4D-equiprobability hypersurface can be defined as seen in Fig. 1. 29 optimization scenarios are selected by performing a scenario-selection process in two steps: (1) Twelve scenarios that do not exceed the maximum systematic setup error given by 2.5 · Σ are selected on the hypersurface. For these scenarios, an additional error of $0.7∙σtotal$ is added to the systematic setup error in order to obtain errors of magnitude as given by Eq. (1). (2) Seventeen additional scenarios are selected in the whole hypervolume (excluding values higher than 2.5 · Σ) that cover any estimated residual range errors (both under and overshoot). Proton range errors were estimated by converting the breathing CTs into maps of water-equivalent path lengths (WEPLs). A random sampling of the hypervolume allows for an evaluation of the treatment errors in the WEPL space. For each sampling, the treatment errors are applied by rigidly shifting and scaling the WEPL values for the setup and range errors, respectively. As per Buti et al. [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
], by storing WEPL values in the CTV for each scenario, the 17 scenarios that yield the largest estimated range uncertainty in the target volume can be selected.
Considering the nominal scenario, this sums up to a total of 30 scenarios. For each scenario, a virtual CT is generated that represents the selected error. These virtual CTs can be imported in our treatment planning system (TPS) and selected as the set of error scenarios.

### 2.3. Robustness evaluation

The plans were evaluated using two methods: a conventional RayStation approach via scripting that uses setup and range errors to calculate perturbed dose scenarios (conventional evaluation – CE) and a more statistically sound method implemented in MCsquare, a Monte Carlo dose engine developed by Souris et al. [
• Souris K.
• Lee J.A.
• Sterpin E.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.
] to realistically simulate proton PBS treatments (statistical evaluation – SE).
CE consists of combinations of the maximum errors of each uncertainty source such as setup, image conversion and three breathing phases (maximum inhale and exhale and mid ventilation). Similarly to the SSMP robust optimization case, 84 evaluation scenarios are used: 7 (setup errors: ± 5 mm in x,y,z directions, additionally the nominal scenario) × 3 (image conversion errors: ±3%, 0%) × 4 (breathing phases: MidP as nominal, MidV, maximum inhale and maximum exhale).
SE takes a more comprehensive approach by randomly sampling error scenarios and recomputing the dose distributions for all of them while discarding the 10% worst scenarios based on the target D95 [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
]. Each simulated scenario represents the entirety of the treatment (30 fractions comprising 10 breathing phases each). This contrasts the conventional approach in which a scenario represents only a combination of errors occurring in a specific breathing phase, for a specific fraction setup because of the lack of correctly modeled random errors. Each SE fraction is individually simulated considering systematic errors, sampled once per full treatment simulation and random errors re-sampled for each fraction [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
]. SE models setup errors by shifting the beam isocenter and the range errors by scaling CT densities [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
]. By default, breathing motion is simulated by recomputing the dose distribution for each breathing phase and accumulating the dose on the mid-position CT (MidP-CT), after non-rigid registration of each breathing phase to the reference phase [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
], commonly known as an accumulated 4D dose calculation approach and referred for simplicity purposes in the rest of this manuscript as SE. In this work, we also consider an ITV-like approach in which random and systematic errors were compiled for each breathing phase and the worst-case was determined out of the 10 computed dose maps. In other words, for each scenario, target coverage is being assessed and compared for each of the 10 breathing phases it encompasses (SE_ITV). SE_ITV is used with the goal of ensuring the ITV is covered by the minimum clinical dose threshold at every phase, acting as a target-coverage control method.
Overall a complete SE evaluation is composed of 100 scenarios. The number of scenarios was chosen due to the convergence of the D95 uncertainty bands from 100 scenarios onwards maintaining a balance between its confidence interval and impact of the statistical noise and computation time for our patient data size [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
]. A 1.17 × 1.17 × 2 mm dose grid resolution and 108 ions per spot were used. The evaluation results come in the form of dose-volume histograms (DVHs). During evaluation we report for the target coverage the D98 (Gy) nominal and worst-case values as well as Dmean (Gy) and V30 (%) for heart and lungs-GTV and D2 (Gy) for spinal cord and esophagus.

### 2.4. Statistical analysis

A paired t-test was performed to assess the statistical significance of the differences in target and OAR dose between our robust optimization methods and between our evaluation methods. The t value is calculated by using formula (2):
$T=m1-m2σd/n∗100$
(2)

Where m1 and m2 are the mean values of each sample set, σd is the standard deviation of the differences of the paired data values and n is the sample size or the number of paired differences. The t value represents the ratio between the difference between the two sets of data and the difference within them, as such a lower t value correlates to the two data sets being more similar. The p value is also automatically calculated by using the sampling distribution of the test statistic under the null hypothesis with a 95% confidence interval or 5% alpha level, meaning that a value lower than 0.05 or 5% indicates the data did not occur by chance.
The mean dose difference as a percentage of the maximum clinical dose constraint between the two methods was calculated as seen below in formula (3):
$ΔD=d_M∗100$
(3)

$d_$ is the mean difference between dose metrics and M is the organ-specific maximum dose (30 Gy for esophagus, 20 Gy for heart, spinal cord and lungs-GTV, 60 Gy for the CTV).

## 3. Results

The results obtained for CTV coverage quantified by D98 in the nominal and worst-case for each scenario selection and robust evaluation method combination are provided in Table 1 and Table 2. An example of individual DVH metrics for patient 9 is shown in Fig. 2. Table 3 provides the results comparing SE and SE_ITV, verifying that the dose distributions meet our target coverage criteria without significant deviations. In terms of the OAR, the average gains in OAR sparing quantified by Dmean, D2 and V30 when going from SSMP to SSJP are provided in Table 4. The OAR sparing data is provided in the Appendix (Table S5) as well as the complete data for SSMP and SSJP (Tables S6 and S7).
Table 1D98 (Gy) target values for evaluated patients. Nominal values represented in black and worst-case values represented in red. SSMP – robust optimization with scenario selection from marginal probabilities, SSJP – robust optimization with scenario selection from joint probabilities, CE – conventional evaluation, SE – statistical evaluation.
PatientSSMPSSJP
CESECESE
158.9/57.259.4/58.858.9/55.459.4/58.8
258.6/56.558.9/56.658.5/55.158.9/56.4
358.7/57.159.1/58.758.7/56.759.2/57.8
458.6/57.358.9/58.058.6/56.259.1/57.3
558.8/57.859.4/58.858.8/57.659.4/58.6
658.7/57.659.4/58.758.6/57.259.4/58.0
758.5/56.659.3/57.958.6/56.859.3/57.5
858.8/57.859.0/58.358.8/58.059.1/57.5
958.8/57.959.4/58.958.9/57.259.5/58.8
1058.9/57.359.6/59.158.7/57.559.7/58.6
1158.6/57.759.2/59.058.6/57.059.3/58.7
1258.8/56.959.5/58.958.8/56.059.5/58.9
1358.9/57.559.3/58.858.8/56.159.3/58.7
Table 2Average nominal and worst-case D98 (Gy) and standard deviation for each scenario selection and evaluation method combination used. SSMP – robust optimization with scenario selection from marginal probabilities, SSJP – robust optimization with scenario selection from joint probabilities, CE – conventional evaluation, SE – statistical evaluation.
Scenario selection methodEvaluation methodD98 average nominal [Gy]D98 average worst-case [Gy]
SSMPCE58.7 ± 0.157.3 ± 0.4
SE59.3 ± 0.258.5 ± 0.7
SSJPCE58.7 ± 0.156.7 ± 0.8
SE59.3 ± 0.258.1 ± 0.7
Table 3Comparison of D98 (Gy) target values between SE which simulates breathing by recomputing the dose distribution for each phase and accumulating the dose on the MidP-CT and SE_ITV which computes dose maps for each breathing phase. SSMP – robust optimization with scenario selection from marginal probabilities, SSJP – robust optimization with scenario selection from joint probabilities. Nominal values represented in black and worst-case values represented in red.
PatientSSMPSSJP
SESE_ITVSESE_ITV
159.4/58.859.0/58.759.4/58.859.0/58.8
258.9/56.658.2/56.758.9/56.458.4/56.4
359.1/58.758.7/58.359.2/57.858.9/57.6
458.9/58.058.5/58.059.1/57.358.7/57.1
559.4/58.858.9/58.559.4/58.658.9/58.5
659.4/58.759.0/58.359.4/58.059.0/58.0
759.3/57.958.8/57.859.3/57.558.9/57.4
859.0/58.358.7/57.159.1/57.558.8/58.1
959.4/58.959.0/58.759.5/58.859.1/58.1
1059.6/59.159.2/58.959.7/58.659.2/58.8
1158.6/57.758.6/58.758.6/57.058.6/58.3
1259.5/58.959.5/58.759.5/58.959.6/58.6
1359.3/58.859.2/58.359.3/58.758.8/58.1
Table 4Average gains in Dmean (Gy), D2 nom nominal (Gy), D2 wc worst-case (Gy) and V30 (%) when switching from robust optimization with scenario selection from marginal probabilities (SSMP) to robust optimization with scenario selection from joint probabilities (SSJP). Evaluated conventionally (CE) as seen in section 2.3.
OARΔD2 nom (Gy)ΔD2 wc (Gy)
Esophagus0.60.9
Spinal cord3.94.1
Dmean (Gy)V30 (%)
Heart1.11.9
Lungs-GTV1.01.9
The SSJP tool showed lower levels of target robustness than SSMP plans, as visualized in Fig. 3. On average, SSMP optimized plans had a 0.5 Gy higher dose in the D98 worst-case as opposed to their SSJP counterparts. In four out of the thirteen cases, switching from SSMP to optimizing with SSJP while using CE led to the worst-case scenario dose falling below our predefined 57 Gy threshold. Three of the patients did not have adequate target coverage using SSMP and CE which remained the case after being optimized with our alternative tool. This was, however, not the case when evaluated with SE.
Gains in terms of OAR sparing can be seen when optimizing with the SSJP tool in all recorded metrics: up to 1.8 Gy in heart Dmean, 2 Gy in lung-GTV Dmean, 9.7 Gy in spinal cord D2 nominal, 8.8 Gy in spinal cord D2 worst-case, 3.1 Gy in esophagus D2 nominal and 3.6 in esophagus D2 worst-case. The OAR results for each metric are visualized in Fig. 4, Fig. 5, and Fig. 6 with the complete data in Appendix (Table S5).
When conventionally evaluated (CE), three SSMP and seven SSJP patients did not meet our minimum worst-case scenario threshold. On the other hand, when the evaluation is performed by SE, only patient #2 failed to meet the clinical threshold for both methods. D98 target values did not vary significantly between SE and SE_ITV as seen in Table 3.
To verify the statistical relevance of the difference between the sets of data, results from a paired t-test in the form of the t value, the p value and the mean dose difference as a percentage of the organ-specific maximum dose can be seen below in Table 5 for OAR and Table 6 for the target.
Table 5T value, p value and mean dose difference as a percentage of the maximum clinical dose constraint ($ΔD$) between SSMP and SSJP optimization OAR data. Evaluated conventionally (CE) as seen in section 2.3.
Dmean (Gy)T-valueP-value$ΔD$ (%)
Heart9.28.5 · 10-75.4
Lungs-GTV6.62.5 · 10-55.2
D2 (Gy)
Spinal cord (nominal)3.90.3 · 10-219.4
Spinal cord (worst-case)6.22.0 · 10-220.4
Esophagus (nominal)2.72.0 · 10-23.0
Esophagus (worst-case)3.30.7 · 10-24.3
V30 (%)
Heart6.82.0 · 10-5
Lungs-GTV5.96.8 · 10-6
Table 6T value, p value and mean dose difference as a percentage of the target dose prescription ($ΔD$) for worst-case. SSMP – robust optimization with scenario selection from marginal probabilities, SSJP – robust optimization with scenario selection from joint probabilities, CE – conventional evaluation, SE – statistical evaluation.
Method 1Method 2T-valueP-value$ΔD$ (%)
SSMP, CESSMP, SE−8.03.8 · 10-6−2.0
SSJP, CESSJP, SE−5.12.8 · 10-4−2.4
SSMP, SESSMP, SE_ITV3.36.4 · 10-30.5
SSJP, SESSJP, SE_ITV1.50.20.2

## 4. Discussion

We aim to assess the individual impact of each robust optimization (SSMP and SSJP) and evaluation method (CE and SE) as well as their combinations to establish an optimal planning strategy. By comparing said combinations in terms of dose metrics, their impact on target coverage and OAR sparing can be evaluated. The benefits of the increased healthy tissue sparing can further be discussed regarding each method’s clinical viability.
Statistically and clinically significant gains in terms of OAR sparing were recorded for all metrics (Dmean, D2, V30) when using SSJP as seen in Fig. 4, Fig. 5, Fig. 6. Except for one patient, we have not observed clinically significant differences for the esophagus D2, due to it being completely within or having a significant portion of its volume within the target ITV in patients 2–13. An estimate of distances between CTV and OARs has been taken: the esophagus was, on average, located 0.5 cm from the CTV and in 76.9% of patients within the ITV, the heart was, on average, located 0.1 cm from the CTV and in 92.3% of patients within the ITV, the spinal cord was, on average, 3.2 cm from the CTV with no patients having it located within the ITV. This proximity of the CTV to the serial organs, in addition to the relatively large setup and range errors (given in Tables S1 and S2) inevitably led to an increase in difficulty during planning and made it even more important to assess the degree by which OARs can be spared. Comparing both scenario selection methods, plans optimized with the SSJP tool showed lower levels of target robustness than SSMP plans. This was expected as the SSJP tool aims at securing robustness at a predefined 90% confidence level with the aim of achieving a level of target robustness situated at the limit of clinical acceptability [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
]. Most robust treatment strategies found in literature select setup and range errors separately, without considering confidence levels. However, the correlation between an increase in target robustness and a higher dose to OAR is made apparent in both worst-case “minimax” optimization and the PTV approach [
• Liu W.
• Zhang X.
• Li Y.
• Mohan R.
Robust optimization of intensity modulated proton therapy.
,
• Liu W.
• Schild S.E.
• Chang J.Y.
• Liao Z.
• Chang Y.-H.
• Wen Z.
• et al.
Exploratory Study of 4D versus 3D Robust Optimization in Intensity Modulated Proton Therapy for Lung Cancer.
,
• Witte M.G.
• Sonke J.J.
• Siebers J.
• Deasy J.O.
• Van Herk M.
Beyond the margin recipe: The probability of correct target dosage and tumor control in the presence of a dose limiting structure.
]. Conventional, worst-case robust optimization based treatment planning tends towards inputting a series of constraints and seeing whether the plan’s robustness is on par with our target coverage criteria. Ideally when using SSJP, a robust objective, namely securing our target coverage at the limit of clinical acceptability, is considered by default. The reduction of the target margin to the bare minimum is the main drive that enables substantial and consistent OAR sparing results, further highlighting the need for an alternative robust evaluation approach that provides an estimate of robustness at the limit of clinical acceptability (i.e. adequate coverage for at least 90% of patients).
The choice of the evaluation method did impact whether the minimum target coverage threshold was achieved. The CE approach that generates error scenarios based on combinations of the maximum errors of each uncertainty source (setup, range and breathing) led to three SSMP and seven SSJP patients not meeting our minimum worst-case scenario threshold. This results from the selection of extreme scenarios outside the 90% hypersphere in the scenario space, therefore beyond the commonly accepted 90% confidence level. As per Van Herk et al. [
• van Herk M.
• Remeijer P.
• Rasch C.
• Lebesque J.V.
The probability of correct target dosage: Dose-population histograms for deriving treatment margins in radiotherapy.
] when using a PTV margin approach, in order to ensure target robustness, appropriate confidence levels need to be established, quantifying to at least 90%. SE allows for a better definition of the confidence interval due to it evaluating in the dosimetric space, i.e., selecting the optimal dose distribution based on previously defined clinical criteria, instead of the scenario space, i.e., whether the scenarios are covered by a set of given robustness parameters.
When using SE and sampling scenarios from the entire dosimetric space [
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
] only patient #2 failed to meet our clinical threshold for both scenario selection methods. This can be attributed to patient #2 having a distinct layout in terms of the number of targeted lymph nodes and their distribution across both lungs which makes compromising between meeting our target goals and not overdosing the OARs significantly more difficult. Results obtained using SE can be attributed to the more detailed, better quantified, and less conservative way it operates. By randomly sampling error scenarios and recomputing the dose distribution for the best 90% scenarios based on target D95 it overall offers a more realistic view on the robustness of the treatment than its conventional counterpart. This discrepancy has significant clinical implications. A more conservative evaluation approach such as the commonly-used worst-case conventional evaluation (CE) has the potential of leading to results that do not pass our clinical criteria since the method by which scenarios are selected only takes into account the maximum errors of each source, not the joint probability of these magnitudes occurring during the treatment process. In clinical practice, this can lead to replanning due to the lack of target coverage, further potentially leading to an increase in OAR dose. Use of a more statistically sound robust evaluation method, such as SE, has the potential to save time within the treatment workflow by reducing the need for further optimization. It should be noted that our approach was limited by both the lack of consideration for the interplay effect as well as the patient-specific CTV expansion done in an effort to reduce treatment planning complexity, as a workaround for having a setup error larger than 5 mm. While exact numbers weren’t recorded, the estimated computation time of the different techniques should be noted, since it can be relevant for their clinical applicability. For optimization, both SSMP and SSJP performed similarly, in the 1–2 h range. While SSJP requires precomputation time in generating the 30 virtual CTs and importing them into our TPS, it makes up for time since the errors are already incorporated in said CTs. A starker difference was noted between CE and SE. While CE can be performed in a matter of 2–3 h, a full SE consisting of 100 scenarios took up to 72 h per case, making it impractical to use in a clinical setting. However, improvements can be achieved using variance reduction techniques and additional computational resources (here the simulation was performed on a single desktop.
Our planning objective was ensuring the CTV was covered in all phases, given this, it is intuitive to evaluate our target’s ITV coverage when using SE. However, this did not properly correlate to how breathing motion was simulated in our initial use of SE. For SE, breathing is simulated by recomputing the dose for each breathing phase and accumulating the dose on the mid-position CT (MidP-CT). To ensure CTV coverage in all breathing phases, a second, control SE approach was used. When using SE_ITV, systematic errors were compiled for each breathing phase and the worst-case was determined out of the 10 computed dose maps, in other words for each scenario, target coverage is being assessed and compared for each of the 10 breathing phases it encompasses. Comparing SE and SE_ITV methods, negligible differences were noted as seen in Table 3, which confirms the CTV is covered by the minimum clinical dose threshold at every phase. SE can be safely implemented as an evaluation tool leading to SSJP becoming a viable scenario selection option for improved OAR sparing while maintaining acceptable levels of target robustness.

## 5. Conclusions

Establishing a proper robust optimization and evaluation workflow is essential to realize the potential of proton therapy. Choosing the appropriate methods is both a matter of considering their statistical consistency as well as pragmatic factors such as processing time and whether the emphasis should be placed on maintaining target coverage or sparing adjacent OARs on a patient-by-patient basis.
Two methods of selecting treatment scenarios for robust optimization were used and compared: scenario selection from marginal probabilities (SSMP) and scenario selection from joint probabilities (SSJP), a method of preselecting a statistically-sound set of treatment error scenarios developed by Buti et al [
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
]. For evaluating the robustness of the plans, the conventional approach (CE) and statistical evaluation (SE) were used.
Use of the SSJP tool led to significant OAR sparing in all recorded metrics (Dmean, V30, D2) with a target robustness within our clinical objectives provided that a statistically sound and comprehensive robustness evaluation method was used (SE). This highlights the importance of using both advanced optimization and evaluation tools when we aim at ensuring a quantified level of robustness.

## Acknowledgements

Vlad Mihai Badiu is supported by Stichting tegen Kanker (Grant reference number FAF-C/2018/1195). Kevin Souris is funded by the Walloon region (MECATECH/BIOWIN, grant number 8090). Elena Borderías Villarroel is supported by a Baillet Latour grant. Gregory Buti is supported by the Télévie Grant from the Belgian ‘Fonds National pour la Recherche Scientifique’ F.R.S-FNRS (Grant No. 7 453 918F).

## Appendix A. Supplementary data

• Supplementary data 1

## References

• Chang J.Y.
• Zhang X.
• Knopf A.
• Li H.
• Mori S.
• Dong L.
• et al.
Consensus Guidelines for Implementing Pencil-Beam Scanning Proton Therapy for Thoracic Malignancies on Behalf of the PTCOG Thoracic and Lymphoma Subcommittee.
Int J Radiat Oncol Biol Phys. 2017; 99: 41-50https://doi.org/10.1016/j.ijrobp.2017.05.014
• Paganetti H.
Range uncertainties in proton therapy and the role of Monte Carlo simulations.
Phys Med Biol. 2012; 57: R99-R117https://doi.org/10.1088/0031-9155/57/11/R99
• Brousmiche S.
• Souris K.
• De Xivry J.O.
• Lee J.A.
• Macq B.
• Seco J.
Combined influence of CT random noise and HU-RSP calibration curve nonlinearities on proton range systematic errors.
Phys Med Biol. 2017; 62: 8226-8245https://doi.org/10.1088/1361-6560/aa86e9
• Kraus K.M.
• Heath E.
• Oelfke U.
Dosimetric consequences of tumour motion due to respiration for a scanned proton beam.
Phys Med Biol. 2011; 56: 6563-6581https://doi.org/10.1088/0031-9155/56/20/003
• Park P.C.
• Cheung J.P.
• Zhu X.R.
• Lee A.K.
• Sahoo N.
• Tucker S.L.
• et al.
Statistical assessment of proton treatment plans under setup and range uncertainties.
Int J Radiat Oncol Biol Phys. 2013; 86: 1007-1013https://doi.org/10.1016/j.ijrobp.2013.04.009
• Liu W.
• Zhang X.
• Li Y.
• Mohan R.
Robust optimization of intensity modulated proton therapy.
Med Phys. 2012; 39: 1079-1091https://doi.org/10.1118/1.3679340
• Pflugfelder D.
• Wilkens J.J.
• Oelfke U.
Worst case optimization: A method to account for uncertainties in the optimization of intensity modulated proton therapy.
Phys Med Biol. 2008; 53: 1689-1700https://doi.org/10.1088/0031-9155/53/6/013
• Fredriksson A.
• Forsgren A.
• Hårdemark B.
Minimax optimization for handling range and setup uncertainties in proton therapy.
Med Phys. 2011; 38: 1672-1684https://doi.org/10.1118/1.3556559
• Bangert M.
• Hennig P.
• Oelfke U.
Analytical probabilistic modeling for radiation therapy treatment planning.
Phys Med Biol. 2013; 58: 5401-5419https://doi.org/10.1088/0031-9155/58/16/5401
• Unkelbach J.
• Paganetti H.
Robust Proton Treatment Planning: Physical and Biological Optimization.
• Cummings D.
• Tang S.
• Ichter W.
• Wang P.
• Sturgeon J.D.
• Lee A.K.
• et al.
Four-dimensional Plan Optimization for the Treatment of Lung Tumors Using Pencil-beam Scanning Proton Radiotherapy.
Cureus. 2018; 10https://doi.org/10.7759/cureus.3192
• Inoue T.
• Widder J.
• van Dijk L.V.
• Takegawa H.
• Koizumi M.
• Takashina M.
• et al.
Limited Impact of Setup and Range Uncertainties, Breathing Motion, and Interplay Effects in Robustly Optimized Intensity Modulated Proton Therapy for Stage III Non-small Cell Lung Cancer.
Int J Radiat Oncol Biol Phys. 2016; 96: 661-669https://doi.org/10.1016/j.ijrobp.2016.06.2454
• Korevaar E.W.
• Habraken S.J.M.
• Scandurra D.
• Kierkels R.G.J.
• Unipan M.
• Eenink M.G.C.
• et al.
Practical robustness evaluation in radiotherapy – A photon and proton-proof alternative to PTV-based plan evaluation.
• Sterpin E.
• Rivas S.T.
• Van den Heuvel F.
• George B.
• Lee J.A.
• Souris K.
Development of robustness evaluation strategies for enabling statistically consistent reporting.
Phys Med Biol. 2021; 66: 045002https://doi.org/10.1088/1361-6560/abd22f
• Buti G.
• Souris K.
• Montero A.M.B.
• Lee J.A.
• Sterpin E.
Towards fast and robust 4D optimization for moving tumors with scanned proton therapy.
Med Phys. 2019; 46: 5434-5443https://doi.org/10.1002/mp.v46.1210.1002/mp.13850
• Souris K.
• Lee J.A.
• Sterpin E.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.
Med Phys. 2016; 43: 1700-1712https://doi.org/10.1118/1.4943377
• Perkó Z.
• van der Voort S.R.
• van de Water S.
• Hartman C.M.H.
• Hoogeman M.
• Lathouwers D.
Fast and accurate sensitivity analysis of IMPT treatment plans using Polynomial Chaos Expansion.
Phys Med Biol. 2016; 61: 4646-4664https://doi.org/10.1088/0031-9155/61/12/4646
• Taasti V.T.
• Hattu D.
• Vaassen F.
• Canters R.
• Velders M.
• Mannens J.
• et al.
Treatment planning and 4D robust evaluation strategy for proton therapy of lung tumors with large motion amplitude.
Med Phys. 2021; 48: 4425-4437https://doi.org/10.1002/mp.v48.810.1002/mp.15067
• Wanet M.
• Sterpin E.
• Janssens G.
• Delor A.
• Lee J.A.
• Geets X.
Validation of the mid-position strategy for lung tumors in helical TomoTherapy.
• Borderías Villarroel E.
• Geets X.
• Sterpin E.
Online adaptive dose restoration in intensity modulated proton therapy of lung cancer to account for inter-fractional density changes.
Phys Imaging Radiat Oncol. 2020; 15: 30-37https://doi.org/10.1016/j.phro.2020.06.004
• van Herk M.
• Remeijer P.
• Rasch C.
• Lebesque J.V.
The probability of correct target dosage: Dose-population histograms for deriving treatment margins in radiotherapy.
Int J Radiat Oncol Biol Phys. 2000; 47: 1121-1135https://doi.org/10.1016/S0360-3016(00)00518-6
• Souris K.
• Barragan Montero A.
• Janssens G.
• Di Perri D.
• Sterpin E.
• Lee J.A.
Technical Note: Monte Carlo methods to comprehensively evaluate the robustness of 4D treatments in proton therapy.
Med Phys. 2019; 46: 4676-4684https://doi.org/10.1002/mp.v46.1010.1002/mp.13749
• Liu W.
• Schild S.E.
• Chang J.Y.
• Liao Z.
• Chang Y.-H.
• Wen Z.
• et al.
Exploratory Study of 4D versus 3D Robust Optimization in Intensity Modulated Proton Therapy for Lung Cancer.
Int J Radiat Oncol Biol Phys. 2016; 95: 523-533https://doi.org/10.1016/j.ijrobp.2015.11.002
• Witte M.G.
• Sonke J.J.
• Siebers J.
• Deasy J.O.
• Van Herk M.
Beyond the margin recipe: The probability of correct target dosage and tumor control in the presence of a dose limiting structure.
Phys Med Biol. 2017; 62: 7874-7888https://doi.org/10.1088/1361-6560/aa87fe
• De Ruysscher D.
• Faivre-Finn C.
• Moeller D.
• Nestle U.
• Hurkmans C.W.
• Le Péchoux C.
• et al.
European Organization for Research and Treatment of Cancer (EORTC) recommendations for planning and delivery of high-dose, high precision radiotherapy for lung cancer.