Full length article| Volume 100, P72-80, August 01, 2022

# Evaluating suggested stricter gamma criteria for linac-based patient-specific delivery QA in the conventional and SBRT environments

Published:June 24, 2022

## Highlights

• Stricter γ criteria may have practical implications including treatment start delays.
• VMAT site-specific delivery QA tolerance limits are unnecessary.
• IMRT/VMAT QA passing rate and plan complexity do not correlate over the range studied.
• VMAT SBRT QA passing rates do not correlate with target volume over the range studied.

## Abstract

### Purpose

To evaluate AAPM TG-218 recommended tolerances for IMRT QA for conventional and SBRT delivery.

### Methods

QA analysis was repeated for 150 IMRT/VMAT patients with varying gamma criteria. True composite delivery was utilized, corrected for detector and output variation. Universal tolerance (TLuniv) and action limits (ALuniv) were compared with statistical process control (SPC) TLSPC and ALSPC values.
Analysis was repeated as a function of plan complexity for 250 non-stereotactic body radiotherapy (SBRT) VMAT patients at 3%/2mm and a threshold of 10% and for 75 SBRT VMAT patients at 2%/2 mm and a threshold of 50% with results plotted as a function of PTV volume. Regions of failure were dose-scaled on the planning CT data sets based on delivery results.

### Results

The IMRT/VMAT TLSPC and ALSPC for gamma criteria of 3%/3 mm were 96.5% and 95.6% and for 3%/2 mm were 91.2% and 89.2%, respectively. Correlation with plan complexity for conventional fractionation VMAT was “low” for all sites with pelvis having the highest r value at −0.35. The equivalent SBRT PTV diameter ranged from 2.0 cm to 5.6 cm. Negative low correlation was found for 38 of 75 VMAT cases below ALuniv.

### Conclusions

The ALuniv and ALSPC are similar for 3%/2 mm. However, our 5% failure rate for ALuniv, may result in treatment start delays approximately 2 times/month, given 40 new cases/month. VMAT QA failure at stricter criteria did not correlate strongly with plan complexity. Site-specific action limits vary less than 3% from the average. SBRT QA results do not strongly correlate with target size over the range studied.

## Introduction

The AAPM TG-218 report provides excellent guidance on the procedures to be followed when performing patient-specific IMRT delivery QA [

Miften M, Olch A, Mihailidis D, et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendation of AAPM Task Group No. 218. Med Phys 2018; 45 (4): e53–e83.

]. In addition, the use of statistical process control (SPC) analysis allows for evaluation of the “strength” of a QA program as compared to universally accepted tolerance (TLuniv) and action limits (ALuniv). It has been recommended that the gamma criteria including percent dose (%) and distance to agreement (DTA), to be evaluated at the aforementioned limits, be lowered from the commonly used 3%/3mm to 3%/2mm (both at a dose threshold of 10%). The application of this recommended stricter gamma criteria (ie. lower DTA) has practical ramifications that should be understood prior to implementation.
The use of the aforementioned TLuniv and ALuniv concepts assumes that these limits are coincident with the goals of the treatment plan being evaluated. For scenarios where this may not be the case such as SBRT, the report suggests that tighter tolerances be considered. In addition, stricter gamma criteria should be used to detect subtle regional errors. SBRT planning typically involves smaller targets and results in less homogeneous dose distributions than more conventionally fractionated IMRT/VMAT plans. An understanding of the clinical significance of this potential decrease in passing rate is essential for SBRT treatment delivery.
The AAPM TG-218 report indicates that a source of variation in IMRT QA measurement can be attributed to variation in plan complexity, specifically between different body sites. It is suggested that it is good practice to determine tolerance limits for cases of high vs. low complexity (ie. head and neck and prostate IMRT cases). To this end it is useful to have an understanding of the effects of plan complexity on delivery QA gamma passing rates.
The purpose of this work is to evaluate, based on the current strength of our QA process, the AAPM TG-218 recommendations for QA tolerances for patient-specific intensity modulated treatments specifically with regard to gamma analysis. We thoroughly evaluate whether or not stricter gamma criteria may result in delays in patient treatment initiation. We evaluate whether or not the use of the stricter gamma criteria for the SBRT setting with the combination of smaller targets and non-homogeneous dose distributions as well as finite detector spacing of 2D arrays may contribute to potential lower passing rates. Additionally, we evaluate whether or not VMAT plan complexity has an effect on gamma passing rates and whether or not site-specific tolerance limits are necessary. Additionally, this work will help illustrate the clinical significance of deviations in delivery detected through the use of stricter gamma criteria for IMRT/VMAT in the conventional and SBRT environments.

## Methods and materials

### IMRT/VMAT: Conventional fractionation

Patient-specific ion chamber array-based delivery QA analysis was repeated with varying gamma criteria for 150 IMRT/VMAT patients previously treated between May 2018 and January 2019. For all cases throughout this study our planning techniques varied minimally between multiple planners and physicians and departmental site-specific dose-volume plan acceptance criteria remained consistent over the given time periods. All delivery QA measurements were obtained using one of two IBA MatriXX Evolution and MultiCube Lite phantom combinations and analyzed with the OmniProTM I’mRT software (IBA Dosimetry America, Bartlett, TN). Both systems were calibrated according to the manufacturer’s instructions prior to use. This 2D array utilizes 1020 ionchambers with 0.76 cm center-to-center spacing. All measurements were interpolated to 1 mm resolution prior to analysis. True composite delivery was utilized, with all analysis performed in absolute dose and corrected for detector and output variation with angular correction applied to all IMRT cases [
• Van Esch A.
• Clermont C.
• Devillers M.
• et al.
On-line quality assurance of rotational radiotherapy treatment delivery by means of a 2D ion chamber array and the Octavius phantom.
,
• Masi L.
• Casamassima F.
• Doro R.
• et al.
Quality assurance of volumetric modulated arc therapy: Evaluation and comparison of different dosimetric systems.
,
• Jursinic P.A.
• Sharma R.
• Reuter J.
MapCHECK used for rotational IMRT measurements: Step-and-shoot, Tomotherapy.
]. Angular correction was based on measurements made on our device using a 10 × 10 cm2 field size at varying beam angles and applied using in-house developed software that corrects for machine output variation as well. TG-218 indicates that “…angular dependence may be smeared out when using beams from many angles, such as with VMAT delivery.…” [

Miften M, Olch A, Mihailidis D, et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendation of AAPM Task Group No. 218. Med Phys 2018; 45 (4): e53–e83.

]. It has been our practice not to apply angular correction to VMAT measurements for gamma analysis and our results support this. The software used does not automatically optimize the position of the dose distribution to minimize spatial uncertainty between measured and planned dose distributions. Any adjustment in the position of the measured dose distribution was made manually in the x and y direction on the coronal plane and limited to perceived phantom setup error (typically 1 mm in either direction). Passing rates for TLuniv and ALuniv (≥95% and 90%, respectively) were determined following global normalization and a low dose threshold of 10%. Center-specific SPC TL (TLSPC) and AL (ALSPC) values were determined for comparison with the TLuniv and ALuniv. The center-specific values were determined according to the method given in the AAPM-TG 218 report using the following equation,
$ΔA=βσ2+(x¯-T2)$
(1)

where ΔA is the difference between upper and lower action limits and is typically written as ± A/2. T is the process target value and σ2 and $x¯$ are the process variance and mean, respectively. As our target value for IMRT QA gamma passing rate was known (ie. 100%), this value was used for T. β is a constant combining the cutoff for an acceptably performing process and a factor that balances type I errors (rejecting the null hypothesis when it is true). For IMRT QA the null hypothesis is that the process in unchanging. As suggested based on current information, a value of 6.0 was used. Control chart limits determined from an I-chart of individual QA measurements were used as tolerance limits. The I-chart is a statistical tool used to identify QA measurements displaying abnormal process behavior. The I-chart has upper and lower limits or control limits and a center line, all calculated from the IMRT QA measurement data. Abnormal behavior is indicated when an IMRT QA measurement results in a value outside the upper or lower limits on the I-chart. The center line and upper and lower limits are given as follows:
$centerline=1n∑1nx$
(2)

$uppercontrollimit=centerline+2.660∙mR¯$
(3)

$lowercontrollimit=centerline-2.660mR¯$
(4)

$mR¯=1n-1∑i=2n|xi-xi-1|$
(5)

where x is an individual QA measurement and n is the total number of measurements and $mR¯$ is the moving range.
To evaluate the effects of increasing plan complexity on passing rate an approximate modulation scaling factor (MSFapprox) was used as a surrogate, defined as:
$MSFapprox=MUIMRTDosefrac$
(6)

where MUIMRT is the required MU for a single fraction of the IMRT/VMAT plan and Dosefrac is the associated prescribed dose for the same fraction. This approximation is based on the general idea that complex plans are derived from complex fluence distributions requiring leaf sequences containing many small, high-MU segments to deliver. Passing rate as a function of MSFapprox was evaluated for both IMRT and VMAT delivery and gamma criteria of 3%/3mm and 3%/2mm using a 10% threshold.
The average time between QA delivery and patient start was also determined for these patients. This time was used to calculate the frequency of patient start delays assuming the AL defines a “no treatment” threshold.

### Conventional fractionation VMAT and site-specific tolerance limits

TG-218 suggests that plan complexity, and subsequently QA delivery passing rates, vary by treatment site and that it is good practice to calculate site-specific tolerance limits. To evaluate this claim, a series of 250 VMAT cases treated between May 2014 and February 2021, previously evaluated at 3%/3mm and a 10% threshold were re-evaluated at 3%/2mm and a 10% threshold. VMAT delivery was chosen as it was found to have a higher ALSPC than IMRT delivery in our QA system. Based on our departmental delivery QA experience we hypothesized that site-specific tolerance limits are unnecessary. Cases used for evaluation were selected as follows: H&N (n = 76), Thorax (n = 40), Abdomen (n = 61) and Pelvis (n = 73). Each treatment site was evaluated as a function of MSFapprox. as defined above. Statistical Process Control (SPC) methods were used to compare the action limit of each subset to the universal action limit (ALuniv) of 90%.

### VMAT SBRT

TG-218 suggests gamma criteria for SBRT be tighter or stricter than those used for non-SBRT delivery. To this end we selected 75 VMAT SBRT cases treated between October 2017 and December 2019 having previously passed routine criteria of 3%/3mm at 10% threshold and re-analyzed at 2%/2mm and a threshold of 50%. All cases were evaluated as a function of PTV volume. Our reasoning for this threshold was based on the rapid dose falloff exploited by SBRT planning. With all evaluations globally normalized to a value within 10% of the maximum dose in the measured data set, resulting values below 50% would be less likely to be clinically significant (ie. less likely to exceed normal tissue dose volume limits). In addition, by decreasing the total number of pixels evaluated in the low dose regions, we can decrease the likelihood of a delivery achieving an acceptable gamma passing rate while failing in clinically significant regions, (e.g. the target). Acknowledging the difficulty in representing the multitude of complex PTV shapes encountered, the equivalent diameter was determined by converting the PTV volume to a sphere for illustration purposes. The equivalent diameter ranged from 2 to 5.6 cm corresponding to CTV/ITV diameters from 1 to 4.6 cm using a 0.5 cm uniform PTV margin, for example. Due to potential difficulties measuring dose distributions for smaller targets due to the detector spacing of our device, data for PTVs < 2 cm diameter are not presented in this work. Given the 0.76 cm center-to-center spacing of our detector array, our hypothesis was that we would see decreasing gamma passing rates as a function of decreasing PTV diameter. The upper limit of 5.6 cm diameter corresponds approximately with the largest targets selected for SBRT at our institution.
In order to determine the clinical relevancy of dose levels of points failing the ALuniv as suggested in TG-218, the cases were evaluated for failures inside and outside the PTV. Delivery QA and gamma analysis were performed with failing regions identified in the axial, sagittal and coronal planes of the original treatment planning CT using isocenter as reference. The corresponding dose was adjusted (scaled) based on the measured vs. planned phantom dose ratio at the same coordinates.

## Results

### IMRT/VMAT: Conventional fractionation

From the 150 IMRT/VMAT cases evaluated, the percentage of pixels falling between TLuniv and ALuniv as well as below ALuniv are presented in Table 1. Additionally, the center-specific TLSPC and ALSPC are determined as demonstrated in TG-218 using the increased number of patients and compared with the universal limits. The average global normalization was 95.2% of the maximum dose in the measurement plane. The average passing rates for the total cohort for gamma criteria of 3%/3mm and 3%/2mm were 98.8% and 96.5%, respectively. The percentage of cases failing TLuniv for gamma criteria of 3%/3mm and 3%/2mm were 1% and 19% and failing ALuniv were 0% and 5%, respectively. During the evaluation of the 5% of cases failing the ALuniv at 3%/2mm, it was found that 7 patients were associated with IMRT delivery and 1 patient received VMAT delivery. Consultation with the attending physicians with respect to the OAR or target involved, indicated that none of the patients would have had their starts delayed based on these findings alone. The center-specific TLSPC and ALSPC for gamma criteria of 3%/3mm and 3%/2mm were 98.6%, 94.9% and 95.8%, 86.1%, respectively. While the ALSPC of 86.1% for 3%/2mm is reasonably close to the ALuniv of 90% it is indicative of the need for improvement in our overall delivery QA process. This improvement may include changing our routine phantom alignment methodology to be image-based rather than using positional lasers and linac crosshairs. Changing to conebeam computed tomography (CBCT) would be more in line with how our patients are aligned on a daily basis. Optimal plane selection to position the detector array within the high dose low gradient region for routine delivery QA may also help improve the process. Improvement in our angular correction methodology for IMRT delivery to include potential change as a function of segment size as well as where the segment(s) is located within the array may also be beneficial. The latter 2 topics are the subjects of separate ongoing studies at our institution. The average of the maximum gamma values found during analysis for 3%/3mm and 3%/2mm were 1.46 and 1.73, corresponding to 1.38%/1.38 mm and 2.19%/1.46 mm beyond limits, respectively. The average percentage of pixels exceeding a gamma value of 1.5 was 0.04% and 0.27%, respectively. Evaluating the IMRT and VMAT cases separately yields ALSPC values of approximately 82.2% and 91.6%, respectively, as depicted in Fig. 1. The lower value for IMRT indicates that VMAT QA performs better than IMRT with respect to SPC in our process. The percentage of pixels falling below ALuniv as well as ALSPC as a function of decreasing gamma criteria are presented in Table 2.
Table 1The percentage of pixels falling between TLuniv and ALuniv ((%) below TLuniv) as well as those below ALuniv. TLSPC and ALSPC values as a function of decreasing gamma criteria are also presented.
IMRT/VMAT3%/3mm3%/2mm2%/2mm2%/1mm
(%) below TLuniv119325
(%) below ALuniv053694
TLSPC (%)98.695.889.975.4
ALSPC (%)94.986.168.727.9
Combined IMRT/VMAT TL and AL for stricter gamma criteria.
Table 2The percentage of pixels falling below the ALuniv. ALuniv values only are evaluated as this parameter is taken to reflect the “do not treat” threshold. ALSPC values as a function of decreasing gamma criteria are also presented.
IMRT only3%/3mm3%/2mm2%/2mm2%/1mm
(%) below ALuniv096099
ALSPC (%)93.882.261.713.4
VMAT only
(%) below ALuniv011289
ALSPC (%)96.191.677.846.0
IMRT and VMAT specific AL for stricter gamma criteria.
The percentage of pixels passing gamma criteria of 3%/3mm and 3%/2mm as a function of MSFapprox for IMRT only, are given in Fig. 2. The Pearson product moment r value for 3%, 3 mm was −0.35 (p = 0.002) indicating “low” correlation. For the purposes of this study the following categories were used to describe the strength of correlation: 0.00–0.25 (little if any), 0.26–0.49 (low), 0.50–0.69 (moderate), 0.70–0.89 (high) and 0.90–1.00 (very high). The r value for 3%/2mm is −0.25 (p = 0.031) indicating “little if any” correlation. The percentage of pixels passing gamma criteria of 3%/3mm and 3%/2mm as a function of MSFapprox for VMAT only, are given in Fig. 3. The r values for 3%/3mm and 3%/2mm were −0.16 (p = 0.170) and −0.19 (p = 0.103), respectively, indicating “little if any” correlation.

### Conventional fractionation VMAT and site-specific tolerance limits

The correlation of gamma passing rates at 3%/2mm and a 10% threshold, as a function of MSFapprox, demonstrated in Fig. 4, was “low” for all sites with pelvis having the highest r value at −0.35 (p = 0.002). All metrics evaluated are illustrated in Table 3. The site-specific action limits differ from the ALuniv by 0.9, −1.1, −2.3, and −4.4% for the Abdomen, H&N, Pelvis and Thorax, respectively. The variance in MSFapprox. values between groups reached a probability level of less than 1% indicating that plan complexity is different between groups. However, the passing rates did not reach significance indicating that this parameter does not vary with plan complexity.
Table 3Metrics for conventional fractionation VMAT, site-specific delivery QA. CV = coefficient of variation, r = Pearson product moment correlation.
TLSPCALSPCAbs diff (ALuniv-ALSPC)CVrpAverage MSFapprox
Abdomen92.890.90.902.12−0.300.0202.8
H&N91.088.91.112.18−0.300.0082.6
Pelvis89.387.72.322.63−0.350.0023.4
Thorax88.785.64.403.08−0.280.0803.2
TOTAL90.188.41.602.50NANANA
Treatment site-specific TL and AL (VMAT only).

### VMAT SBRT

Fig. 5 illustrates the initial passing rates at 3%/3mm (diamonds), the overall results for the stricter gamma criteria of 2%/2mm (hollow squares) and the values failing the ALuniv (90%) at the stricter criteria (triangles). Approximately 26% of the cases were below the ALuniv with increasing failure with increasing equivalent diameter. The correlation between passing rate and target size was low (r = −0.31, p = 0.006).
Fig. 6 illustrates the results for dose value comparisons failing criteria within the PTV only. Fifty-seven percent of these cases were evaluated at 2 regions failing criteria when compared to calculation. Assuming the ablative intent of SBRT and resultant heterogenous dose distributions within the target, it is found that the comparison of failed regions with the maximum percentage of prescribed dose in the approved treatment plans would not warrant treatment cancellation or delay. This process was repeated for regions outside the PTV with results illustrated in Fig. 7. The 50% threshold was chosen to include critical structures adjacent to the target. Four of the 7 regions fell below approved doses (2 cases had 2 regions). Three of 7 cases resulted in regions exceeding dose as depicted in the TPS (max deviation 20.2%); however, none of the regions resulted in doses that would exceed OAR tolerance.

## Discussion

### IMRT/VMAT: Conventional fractionation

Throughout this work we have chosen to concentrate mainly on the AL boundary vs. the TL. We consider the AL a “no-treatment lower limit” for passing rate indicating that our process may be out of control. The information obtained from the gamma analysis in one or two irradiation planes (requiring table height adjustment between planes and phantom re-irradiation) is insufficient to determine the clinical significance of the QA failures. Additionally, it has long been the practice at our center to investigate cases further on an individual basis if there is variance from the expected value for a given class of cases (ie. H&N, prostate, etc.), regardless if TL has been met. We have also found value in the SPC method in determining the strength of our process by comparing our ALSPC to the ALuniv. Our combined IMRT/VMAT ALSPC was 94.9% and 86.1% for 3%/3mm (10% threshold) and 3%/2 mm, respectively. While 86.1% is “close to” the 90% ALuniv value proposed in TG-218, it indicates improvement may be needed. This improvement may include image-based phantom alignment, optimization of phantom measurement plane selection as well as a more robust angular correction method. Separating this value by delivery method yields 82.2% and 91.6% ALSPC for IMRT and VMAT at 3%/2mm, respectively. The lower value for IMRT is thought to be related to our implementation of device angular correction and its dependency on segment size and position. This dependency is being investigated further.
From Fig. 1 it can be seen that while adopting the AAPM TG-218 suggested gamma criteria of 3%/2mm (10% threshold) is possible, further reductions would require significant improvement in our QA process. As it stands, the suggested criteria are associated with practical ramifications. As previously mentioned we consider the AL to be a no-treatment boundary. From Table 1 it is seen that our QA process results in values failing this boundary approximately 5% of the time for a gamma of 3%/2mm. In addition, we have found that plan approval and subsequent delivery QA are performed the evening prior to treatment start approximately 40% of the time. If we assume 1700 patients are treated annually with 70% via IMRT/VMAT delivery, this would indicate approximately 2 patients/month having their treatment start date delayed. This will inevitably have ramifications for the patient ranging from scheduling issues to coordination with chemotherapy delivery. Failure of the AL for the current criteria of 3%/3mm is found to be 0%. Should the decrease in gamma criteria be beneficial to the patient these ramifications would of course, be warranted.
We found no meaningful correlation for this patient population supporting the idea, as reported by others, that delivery failure as depicted through gamma analysis does not correlate well with clinically meaningful endpoints even for stricter criteria [
• Kruse J.J.
On the insensitivity of single field planar dosimetry to IMRT inaccuracies.
,
• Nelms B.E.
• Zhen H.
• Tome W.A.
Per-beam, planar IMRT QA passing rates do not predict clinically relevant patient dose errors.
,
• Zhen H.
• Nelms B.E.
• Tome W.A.
Moving from γ passing rates to patient DVH-based QA metrics in pretreatment dose QA.
,
• Carrasco P.
• Jornet N.
• Latorre A.
• et al.
3D DVH-based metric analysis versus per-beam planar analysis in IMRT pretreatment verification.
,
• Stasi M.
• Bresciani S.
• Miranti A.
• et al.
Pretreatment patient-specific IMRT quality assurance: A correlation study between γ index and patient clinical dose volume histogram.
,

Coleman L, Skourou C. Sensitivity of volumetric modulated arc therapy patient specific QA results to multileaf collimator errors and correlation to dose volume histogram based metrics. Med Phys 2013; 40 (11): 111715-1–111715-7.

,
• Glenn M.C.
• Hernandez V.
• Saez J.
• Followill D.S.
• Howell R.M.
• Pollard-Larkin J.M.
• et al.
Treatment plan complexity does not predict IROC Houston anthropomorphic head and neck phantom performance.
]. In fact, both false negative and false positive results are possible, rendering plan deliverability acceptance to the evaluation of a passing rate, questionable [
• Stasi M.
• Bresciani S.
• Miranti A.
• et al.
Pretreatment patient-specific IMRT quality assurance: A correlation study between γ index and patient clinical dose volume histogram.
]. This should not reflect negatively on the usefulness of the gamma index method as a tool but may in part be a function of how the QA system is commonly implemented [
• Ouyang L.
• Gu X.
• Pompoš A.
• Bao Q.
• Solberg T.D.
Breaking bad IMRT QA practice.
] (ie. measurements made in the high dose low gradient region) where the measurement apparatus may or may not be coincident with the plane containing organs at risk. Given the 2D-nature of available detector arrays, it is impractical to acquire enough information to potentially lead to markedly improved significance.
It has also been suggested that failure rates may increase as a function of plan complexity [

Miften M, Olch A, Mihailidis D, et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendation of AAPM Task Group No. 218. Med Phys 2018; 45 (4): e53–e83.

]. Fig. 2, Fig. 3 illustrate failure rate as a function of MSFapprox. for IMRT and VMAT delivery, respectively. Over the ranges studied including up to 8 for IMRT and 5 for VMAT, this was not shown to be the case for gamma criteria of 3%/3mm nor 3%/2mm with “low” correlation being the highest level attained. The reduction in gamma criteria does not appear to improve this correlation appreciably given the manner in which we have described plan complexity.
We have found minimal correlation between gamma analysis and clinical endpoints, as well as the lack of correlation with plan complexity. Given our clinical practice as described, we believe the suggested stricter gamma criteria to be unwarranted. These findings may not be translatable for other scenarios. Knowing this, without correlation between gamma analysis results and patient outcomes or at least a 3D-reconstructed plan deliverability-based rejection process, arbitrarily decreasing gamma criteria for plan acceptance may do nothing more than add unnecessary treatment start delays and additional, unfounded work for the medical physicist/dosimetrist. Perhaps patient-specific delivery QA and subsequent gamma analysis, following establishment of an appropriate QA process through the SPC method, can be used to verify endpoints that may be coarse but crucial such as:
-Is there evidence of modulation (leaf sequencing)?
-Has the expected absolute dose been delivered?
-Does the shape of the delivered dose distribution resemble the planned distribution?
-Are expected bulges and concavities in the distribution present in the appropriate locations?
-Do the measured and TPS-calculated isodoses coincide appropriately over the range evaluated?
This is fundamentally based on the delivery of physical dose and issues will not resolve if measurements are only made once at limited points. The question is whether gamma analysis is simply to help avoid RT accidents (severe dose errors or target misses) or can potentially improve clinical outcome. For example, a point dose measurement with a 5% dose action level could potentially avoid significant dose errors, but reducing it to 2% would not improve the clinical outcome.

### Conventional fractionation VMAT and site-specific tolerance limits

The TG-218 report suggests that plan complexity varies as a function of treatment site and infers good practice to include different tolerance limits for H&N vs. prostate cases, as an example. We evaluated gamma passing rates at 3%/2mm (10% threshold) for VMAT delivery of H&N, Pelvis, Thorax, and Abdomen plans, all with respect to plan complexity as previously defined. We found correlation with plan complexity to be low for all QA results across all groups. In fact, H&N demonstrated the lowest plan complexity with an average MSFapprox. of 2.6, and the second highest ALSPC with a value of 88.9% with only abdomen having a higher value. AL decreases from 90.9, 88.9, 87.7 and 85.6% for Abdomen, H&N, Pelvis and Thorax, respectively. However, plan complexity as measured by the MSFapprox is actually greater for Pelvis (3.4) than Thorax (3.2). Given the critical structure limits, particularly low-dose limits for lung when treating the thorax, and the increased use of partial arcs, the changes in SPC derived AL may be device related and not necessarily plan complexity related as defined here.
VMAT is rarely used for thoracic treatment outside the SBRT setting at our institution. This is evidenced by the lower number of cases available for evaluation in this body site as well as the higher coefficient of variation (CV) of 3.08% where.
$CV=standarddeviation/mean×100.$
(7)

Removing thorax from this study would result in a TOTAL CV of approximately 2.31 and tighter variation between subgroups, further supporting a singular TL and AL for all groups. As it stands the Thorax delivery QA values are more variable relative to their mean than Pelvis, Abdomen or H&N.

### VMAT SBRT

The low correlation of passing rate with PTV volume, at the reduced gamma criteria, was not clinically meaningful. While opposite our initial hypothesis, this may be attributed to more complex shapes with larger volume targets. It seems intuitive that larger volume targets may share proximity with critical structures with strict dose volume limits and may result in concave dose distributions. The high dose gradients associated with these concavities and resulting from the conformity of the dose distribution, may present more difficulties for arrays with lower detector resolution. Additionally, given the 0.76 cm center-to-center spacing in our chamber array, the higher passing low-volume targets, typically more spherical or ellipsoidal in shape, may reflect positively with respect to the interpolation method used with this device. It is of note that smaller diameter targets present challenges for delivery QA and may in part be a function of detector resolution as reported by others [
• Woon W.
• Ravindran P.B.
• Ekayanake P.
• et al.
A study on the effect of detector resolution on γ index passing rate for VMAT and IMRT QA.
,
• Bruschi A.
• Esposito M.
• Pini S.
• Ghirelli A.
• Zatelli G.
• Russo S.
How the detector resolution affects the clinical significance of SBRT pre-treatment quality assurance results.
]. Raw readings are measured at the detector resolution defined by the inherent detector spacing. The treatment planning system exported dose distribution is at a resolution of 1 mm × 1 mm. If a “peak” in the dose distribution occurs within the boundaries of the chambers (between detectors), the planning system will export this; however, the peak will not be picked up by the chambers during measurement. All smoothing of the measurement to finer resolution is based on interpolation. If none of the chambers detect the high dose region(s) then the interpolated isodose lines cannot contain it. Any following gamma comparison will then fail (depending on the magnitude of the dose difference between the exported peak and the interpolated dose value at the same position). One practical way of solving this is to shift the array to assure the dose peak coincides with a chamber location. This value is then available in the interpolations and helps to minimize this effect.
The lack of correlation of gamma results with clinical endpoints remains an issue even with SBRT [
• Fredh A.
• Scherman J.B.
• Fog L.S.
• Munck af Rosenschöld P.
Patient QA systems for rotational radiation therapy: a comparative experimental study with intentional errors.
,
• Heilemann G.
• Poppe B.
• Laub W.
On the sensitivity of common γ-index evaluation methods to MLC misalignments in Rapidarc quality assurance.
,
• Kim J.
• Park S.
• Kim H.
• et al.
The sensitivity of γ-index method to the positioning errors of high-definition MLC in patient-specific VMAT QA for SBRT.
,
• Xia Y.
• Zlateva Y.
• et al.
Application of TG-218 action limits to SRS and SBRT pre-treatment patient specific QA.
]. Xia et al. evaluated 4 different QA systems for SRS and SBRT delivery and found that it is possible to apply stricter, device-specific gamma criteria for analysis with acceptable action and tolerance limits. They demonstrated reducing the criteria from the commonly used standard of 3%/3mm to a range down to 3%/1mm; however, they found no correlation with plan complexity [
• Xia Y.
• Zlateva Y.
• et al.
Application of TG-218 action limits to SRS and SBRT pre-treatment patient specific QA.
]. In the current work we have also demonstrated reducing the gamma criteria to 2%/2mm at a 50% threshold with acceptable action limits with no correlation with plan complexity. In addition, we have found no correlation with PTV volume nor would cases failing these reduced criteria have resulted in treatment delays, further illuminating the lack of correlation with clinical endpoints.
There has been work reported that would allow for a migration to delivery-based QA resulting in 3D dose calculated for comparison with the initial, physician-approved treatment plan through DVH analysis [
• Stasi M.
• Bresciani S.
• Miranti A.
• et al.
Pretreatment patient-specific IMRT quality assurance: A correlation study between γ index and patient clinical dose volume histogram.
,
• Renner W.D.
• Safaraz M.
• Earl M.A.
• et al.
A dose delivery verification method for conventional and intensity modulated radiation therapy using measured field fluence distributions.
,
• Renner W.D.
• Norton K.
• Holmes T.
A method for deconvolution of integrated electronic portal images to obtain incident fluence for dose reconstruction.
,
• van Elmpt W.
• Nijsten S.
• Mijnheer B.
• Dekker A.
• Lambin P.
The next step in patient-specific QA: 3D dose verification of conformal and intensity-modulated RT based on EPID dosimetry and Monte Carlo dose calculations.
,
• Boggula R.
• Lorenz F.
• Meuller L.
• et al.
Experimental validation of a commercial 3D dose verification system for intensity-modulated arc therapies.
,
• Godart J.
• Korevaar E.W.
• Visser R.
• et al.
Reconstruction of high-resolution 3D dose from matrix measurements: error detection capability of the COMPASS correction kernel method.
,
• Zhen H.
• Nelms B.E.
• Tome W.A.
Moving from γ passing rates to patient DVH-base QA metrics in pretreatment dose QA.
,
• Wu C.
• Hosier K.E.
• Beck K.E.
• et al.
On using 3D γ-analysis for IMRT and VMAT pretreatment plan QA.
,
• Nakaguchi Y.
• Araki F.
• Maruyama M.
• Saiga S.
Dose verification of IMRT by use of a COMPASS transmission detector.
,
• Chan M.F.
• Li J.
• Schupak K.
• et al.
Using a novel QA dose tool to quantify the impact of systematic errors otherwise undetected by conventional QA methods: clinical head and neck case studies.
]. This would allow the physician to make decisions based on dose-volume limits and dose distributions, similar to initial plan acceptance, providing a direct comparison with clinical endpoints without having to decipher potential deviations of dose or DTA in the extremes for low and high gradient regions. Given the work presented here we would advocate that the excellent methods described in the TG-218 report be followed to achieve a level of conformity between centers and that the SPC methods be utilized to assure that the strength of individual programs compare favorably with the universal action limit definition as described. However, we fail to find advantage in decreasing the gamma criteria from 3%/3mm to 3%/2mm (threshold = 10%).
As indicated previously, our delivery QA process follows the recommendations of the AAPM TG-218 report. In addition, the software/hardware system used is associated with an answer of “Yes” to all questions in the vendor survey section of that report except for providing auto registration or assuming the center of each image as a common point. To this end we believe this work to be generalizable for centers practicing similarly and with comparable QA systems but would suggest caution and evaluation of a center-specific patient cohort to verify.

## Conclusions

From this study it appears possible to tighten our gamma criteria from 3%/3mm to 3%/2mm (10% threshold) as suggested in AAPM TG-218. However, the change from 0% to 5% of cases failing the ALuniv could result in treatment starts being delayed approximately 2 times/month, given a monthly average of 40 IMRT/VMAT new patients where delivery QA was performed the night prior to treatment initiation. In Addition, none of the regions failing the gamma criteria in these cases correlate with clinically significant endpoints. Our findings from SPC limits derived from our clinical data sets demonstrates our fixed-beam IMRT delivery process to be inferior to that of VMAT as illustrated through the greater deviation from the ALuniv and warrants further investigation.
In this study we found that failure to pass gamma criteria as a function of plan complexity, as described by the MSFapprox, exhibited little to no correlation for IMRT or VMAT delivery over the ranges studied. Additionally, our analysis of failure rates as a function of VMAT plan complexity across different body sites did not support site-specific action limits.
Following VMAT SBRT delivery QA evaluation using the array specified, it was found that results do not strongly correlate with target size over the range studied and it can be argued that no case in this study would have been delayed for gamma failure either inside or outside the PTV. Variance resulting in higher than predicted dose outside the target was found to be acceptable in all cases.
We find the methodology described in TG-218 to be an excellent guide to assure more uniform IMRT/VMAT delivery QA results throughout the radiation oncology community. However, through this work we demonstrate that the suggested stricter gamma criteria may not strongly predict clinically meaningful delivery errors.

## Author contribution statement

All authors contributed to the acquisition, analysis and interpretation of the data associated with this work and have critically revised drafts and given approval for the final version for submission for publication.
The authors have no competing interests to declare.
This publication was supported by grant number P30 CA006927 from the National Cancer Institute, NIH. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## References

1. Miften M, Olch A, Mihailidis D, et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendation of AAPM Task Group No. 218. Med Phys 2018; 45 (4): e53–e83.

• Van Esch A.
• Clermont C.
• Devillers M.
• et al.
On-line quality assurance of rotational radiotherapy treatment delivery by means of a 2D ion chamber array and the Octavius phantom.
Med Phys. 2007; 34: 3825-3837
• Masi L.
• Casamassima F.
• Doro R.
• et al.
Quality assurance of volumetric modulated arc therapy: Evaluation and comparison of different dosimetric systems.
Med Phys. 2011; 34: 612-621
• Jursinic P.A.
• Sharma R.
• Reuter J.
MapCHECK used for rotational IMRT measurements: Step-and-shoot, Tomotherapy.
RapidArc Med Phys. 2011; 37: 2837-2846
• Kruse J.J.
On the insensitivity of single field planar dosimetry to IMRT inaccuracies.
Med Phys. 2010; 37: 2516-2524
• Nelms B.E.
• Zhen H.
• Tome W.A.
Per-beam, planar IMRT QA passing rates do not predict clinically relevant patient dose errors.
Med Phys. 2011; 38: 1037-1044
• Zhen H.
• Nelms B.E.
• Tome W.A.
Moving from γ passing rates to patient DVH-based QA metrics in pretreatment dose QA.
Med Phys. 2011; 38: 5477-5489
• Carrasco P.
• Jornet N.
• Latorre A.
• et al.
3D DVH-based metric analysis versus per-beam planar analysis in IMRT pretreatment verification.
Med Phys. 2012; 39: 5040-5049
• Stasi M.
• Bresciani S.
• Miranti A.
• et al.
Pretreatment patient-specific IMRT quality assurance: A correlation study between γ index and patient clinical dose volume histogram.
Med Phys. 2012; 39: 7626-7634
2. Coleman L, Skourou C. Sensitivity of volumetric modulated arc therapy patient specific QA results to multileaf collimator errors and correlation to dose volume histogram based metrics. Med Phys 2013; 40 (11): 111715-1–111715-7.

• Glenn M.C.
• Hernandez V.
• Saez J.
• Followill D.S.
• Howell R.M.
• Pollard-Larkin J.M.
• et al.
Treatment plan complexity does not predict IROC Houston anthropomorphic head and neck phantom performance.
Phys Med Biol. 2019; 63: 205015
• Ouyang L.
• Gu X.
• Pompoš A.
• Bao Q.
• Solberg T.D.
Breaking bad IMRT QA practice.
J Appl Clin Med Phys. 2015; 16: 154-165
• Woon W.
• Ravindran P.B.
• Ekayanake P.
• et al.
A study on the effect of detector resolution on γ index passing rate for VMAT and IMRT QA.
J Appl Clin Med Phys. 2017; 19: 230-248
• Bruschi A.
• Esposito M.
• Pini S.
• Ghirelli A.
• Zatelli G.
• Russo S.
How the detector resolution affects the clinical significance of SBRT pre-treatment quality assurance results.
Physica Med. 2018; 49: 129-134
• Fredh A.
• Scherman J.B.
• Fog L.S.
• Munck af Rosenschöld P.
Patient QA systems for rotational radiation therapy: a comparative experimental study with intentional errors.
Med Phys. 2013; 40: 031716
• Heilemann G.
• Poppe B.
• Laub W.
On the sensitivity of common γ-index evaluation methods to MLC misalignments in Rapidarc quality assurance.
Med Phys. 2013; 40
• Kim J.
• Park S.
• Kim H.
• et al.
The sensitivity of γ-index method to the positioning errors of high-definition MLC in patient-specific VMAT QA for SBRT.
Radiat Oncol. 2014; 9: 1-12
• Xia Y.
• Zlateva Y.
• et al.
Application of TG-218 action limits to SRS and SBRT pre-treatment patient specific QA.
J Radiosurgery SBRT. 2020; 7: 135-147
• Renner W.D.
• Safaraz M.
• Earl M.A.
• et al.
A dose delivery verification method for conventional and intensity modulated radiation therapy using measured field fluence distributions.
Med Phys. 2003; 30: 2996-3005
• Renner W.D.
• Norton K.
• Holmes T.
A method for deconvolution of integrated electronic portal images to obtain incident fluence for dose reconstruction.
J Appl Clin Med Phys. 2005; 6: 22-39
• van Elmpt W.
• Nijsten S.
• Mijnheer B.
• Dekker A.
• Lambin P.
The next step in patient-specific QA: 3D dose verification of conformal and intensity-modulated RT based on EPID dosimetry and Monte Carlo dose calculations.
Radiother Oncol. 2008; 86: 86-92
• Boggula R.
• Lorenz F.
• Meuller L.
• et al.
Experimental validation of a commercial 3D dose verification system for intensity-modulated arc therapies.
Phys Med Biol. 2010; 55: 5619-5633
• Godart J.
• Korevaar E.W.
• Visser R.
• et al.
Reconstruction of high-resolution 3D dose from matrix measurements: error detection capability of the COMPASS correction kernel method.
Phys Med Biol. 2011; 56: 5029-5043
• Zhen H.
• Nelms B.E.
• Tome W.A.
Moving from γ passing rates to patient DVH-base QA metrics in pretreatment dose QA.
Med Phys. 2011; 38: 5477-5489
• Wu C.
• Hosier K.E.
• Beck K.E.
• et al.
On using 3D γ-analysis for IMRT and VMAT pretreatment plan QA.
Med Phys. 2012; 39: 3051-3059
• Nakaguchi Y.
• Araki F.
• Maruyama M.
• Saiga S.
Dose verification of IMRT by use of a COMPASS transmission detector.
Radiol Phys Technol. 2012; 5: 63-70
• Chan M.F.
• Li J.
• Schupak K.
• et al.
Using a novel QA dose tool to quantify the impact of systematic errors otherwise undetected by conventional QA methods: clinical head and neck case studies.
Technol Cancer Res Treat. 2014; 13: 57-67