Full length article| Volume 99, P31-43, July 01, 2022

# Image-based data mining applies to data collected from children

Open AccessPublished:May 21, 2022

## Highlights

• The applicability of the complete IBDM workflow to pediatric data was assessed for the first time.
• IBDM is robustly applicable to pediatric data.
• IBDM reliably identifies systematic dose differences as small as 1%.
• IBDM performance is sensitive to characteristics of the dose and incidence rate.

## Abstract

### Purpose

Image-based data mining (IBDM) is a novel voxel-based method for analyzing radiation dose responses that has been successfully applied in adult data. Because anatomic variability and side effects of interest differ for children compared to adults, we investigated the feasibility of IBDM for pediatric analyses.

### Methods

We tested IBDM with CT images and dose distributions collected from 167 children (aged 10 months to 20 years) who received proton radiotherapy for primary brain tumors. We used data from four reference patients to assess IBDM sensitivity to reference selection. We quantified spatial-normalization accuracy via contour distances and deviations of the centers-of-mass of brain substructures. We performed dose comparisons with simplified and modified clinical dose distributions with a simulated effect, assessing their accuracy via sensitivity, positive predictive value (PPV) and Dice similarity coefficient (DSC).

### Results

Spatial normalizations and dose comparisons were insensitive to reference selection. Normalization discrepancies were small (average contour distance < 2.5 mm, average center-of-mass deviation < 6 mm). Dose comparisons identified differences (p < 0.01) in 81% of simplified and all modified clinical dose distributions. The DSCs for simplified doses were high (peak frequency magnitudes of 0.9–1.0). However, the PPVs and DSCs were low (maximum 0.3 and 0.4, respectively) in the modified clinical tests.

### Conclusions

IBDM is feasible for childhood late-effects research. Our findings may inform cohort selection in future studies of pediatric radiotherapy dose responses and facilitate treatment planning to reduce treatment-related toxicities and improve quality of life among childhood cancer survivors.

## 1. Introduction

Thanks to major advancements in cancer care, 85% of pediatric oncology patients now survive 5 years or more, and many of them become long-term survivors [
• Noone A.
• Krapcho M.
• Miller D.
• Brest A.
• Yu M.
• et al.
SEER Cancer Statistics Review, 1975–2016.
]. Among the cancer treatment options available today, radiotherapy remains one of the most commonly used tools, involved in the treatment of approximately half of all childhood cancer cases [
• Hudson M.S.
Jude LIFE Study Population.
]. Radiation, however, is associated with toxicities that increase the risk of severe chronic health conditions, including cardiac, endocrine, and neurologic conditions, in long-term survivors of childhood cancer [
• Suh E.
• Stratton K.L.
• Leisenring W.M.
• Nathan P.C.
• Ford J.S.
• Freyer D.R.
• et al.
Late mortality and chronic health conditions in long-term survivors of early-adolescent and young adult cancers: a retrospective cohort analysis from the Childhood Cancer Survivor Study.
]. Therefore, radiotherapy research increasingly focuses on mitigating or avoiding its long-term sequalae. Such research either directly [
• Wilson L.J.
• Newhauser W.D.
Generalized approach for radiotherapy treatment planning by optimizing projected health outcome: preliminary results for prostate radiotherapy patients.
,
• Langendijk J.A.
• Lambin P.
• De Ruysscher D.
• Widder J.
• Bos M.
• Verheij M.
Selection of patients for radiotherapy with protons aiming at reduction of side effects: The model-based approach.
,
• Rechner L.A.
• Eley J.G.
• Howell R.M.
• Zhang R.
• Mirkovic D.
• Newhauser W.D.
Risk-optimized proton therapy to minimize radiogenic second cancers.
] or indirectly [
• Wu Q.
• Mohan R.
Algorithms and functionality of an intensity modulated radiotherapy optimization system.
] relies on our understanding of the relation between the physical dose of radiation delivered to a specific organ or structure and its biologic effects, known as the dose–response relationship. Unfortunately, the mathematical form and precise anatomic location relevant to this relation is often uncertain or unknown [
• ASTRO
Quantitative analyses of normal tissue effects in the clinic (QUANTEC).
].
Conventional methods of elucidating dose–response relationships involve dose-volume analyses. These methods summarize heterogeneous 3D dose distributions into descriptive metrics for whole tissues and organs. This limits analysis, however, to an a priori selection of a critical organ, sub-organ, or tissue. Recent developments in radiation biology have revealed that this selection is not as straight forward as previously thought. For example, some biologic effects are associated with radiation doses deposited at distant anatomic locations [
• Siva S.
• MacManus M.P.
• Martin R.F.
• Martin O.A.
], whereas others appear to be associated with the radiation doses in only a portion or sub-volume of an organ [
• Schwartz D.L.
• Hutcheson K.
• Barringer D.
• Tucker S.L.
• Kies M.
• Holsinger F.C.
• et al.
Candidate Dosimetric Predictors of Long-Term Swallowing Dysfunction After Oropharyngeal Intensity-Modulated Radiotherapy.
,
• Toussaint L.
• Indelicato D.J.
• Stokkevåg C.H.
• Pedro C.
• Mikkelsen R.
• et al.
Radiation doses to brain substructures associated with cognition in radiotherapy of pediatric brain tumors.
]. Furthermore, the relevance of such summary statistics is unclear in the age of highly conformal radiation treatments with complex dose distributions [
• Chiavassa S.
• Bessieres I.
• Edouard M.
• Mathot M.
• Moignier A.
Complexity metrics for IMRT and VMAT plans: a review of current literature and applications.
]. The widespread availability of such highly conformal radiation treatments also increases the importance of precisely understanding radiation sensitivity at a sub-organ level because it is now often possible to optimize treatments to avoid specific sensitive volumes, if they are known.
Voxel-based analysis methods, also known as image-based data mining (IBDM), have gained popularity in recent years for their ability to maintain the spatial complexity of entire 3D dose distributions, their freedom from a priori anatomic assumptions, and their independence from structure delineations [
• Palma G.
• Monti S.
• Cella L.
Voxel-based analysis in radiation oncology: A methodological cookbook.
]. These methods comprise two primary steps: spatial normalization to a reference anatomy and voxelized dose comparison. Several studies have demonstrated the potential of IBDM to reveal details of the dose–response relation that had eluded traditional dose-volume methods [
• Chen C.
• Witte M.
• Heemsbergen W.
• van Herk M.
Multiple comparisons permutation test for image based data mining in radiotherapy.
,
• Acosta O.
• Drean G.
• Ospina J.D.
• Simon A.
• Haigron P.
• Lafond C.
• et al.
Voxel-based population analysis for correlating local dose and rectal toxicity in prostate cancer radiotherapy.
,
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
,
• Palma G.
• Monti S.
• D'Avino V.
• Conson M.
• Liuzzi R.
• Pressello M.C.
• et al.
A Voxel-Based Approach to Explore Local Dose Differences Associated With Radiation-Induced Lung Damage.
,

Witte MG, Heemsbergen WD, Bohoslavsky R, Pos FJ, Al-Mamgani A, Lebesque JV, et al. Relating Dose Outside the Prostate With Freedom From Failure in the Dutch Trial 68 Gy vs. 78 Gy. Int J Radiat Oncol. 2010;77:131-8.

]. For example, one study revealed that radiation dose in a small region across the base of the heart strongly correlates with the overall survival of patients with lung cancer [
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
]. Another study of radiation-induced trismus (difficulty in mouth opening) revealed a significant correlation between the severity and the dose in and near the ipsilateral masseter [
• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
]. Correlations with the IBDM-identified volume were stronger than that in a priori-identified structures. All of these studies, however, focused on data acquired from and health effects relevant to adults.
Dose-response relationships for childhood cancer treatments represent a niche field with unique challenges. Childhood cancers are histologically distinct from adult cancers and are exceedingly rare, accounting for only 1% of all cancer cases [
• Noone A.
• Krapcho M.
• Miller D.
• Bishop K.
• Kosary C.
• et al.
SEER Cancer Statistics Review, 1975–2014.
]. Therefore, pediatric-specific study cohorts accumulate slowly. Furthermore, not only are children’s young bodies inherently more vulnerable to radiation damage than adults’, but they also face several decades of survivorship longer than is expected of most adult survivors. As a result, the specific side effects of interest among childhood cancer survivors often differ from those for survivors of adult cancer. For example, heart disease [
• Mulrooney D.A.
• Armstrong G.T.
• Huang S.
• Ness K.K.
• Ehrhardt M.J.
• Joshi V.M.
• et al.
Cardiac Outcomes in Adult Survivors of Childhood Cancer Exposed to Cardiotoxic Therapy: A Cross-sectional Study.
] and subsequent malignant neoplasms [
• Facoetti A.
• Barcellini A.
• Valvo F.
• Pullia M.
The Role of Particle Therapy in the Risk of Radio-induced Second Tumors: A Review of the Literature.
] are of particular concern to childhood cancer survivors due to their long latency of a decade or more. Both of these effects are also associated with low doses of radiation [
• Darby S.C.
• Ewertz M.
• McGale P.
• Bennet A.M.
• Blom-Goldman U.
• Brønnum D.
• et al.
Risk of Ischemic Heart Disease in Women after Radiotherapy for Breast Cancer.
,

Bates JE, Shrestha S, Liu Q, Mulrooney DA, Smith SA, Leisenring WM, et al. Low-dose radiation to cardiac substructures and late-onset cardiac disease: A report from the Childhood Cancer Survivor Study (CCSS). J Clin Oncol. 2021;39:10027.

,
• Diallo I.
• Samand A.
• Quiniou E.
• Chavaudra J.
• et al.
Frequency Distribution of Second Solid Cancer Locations in Relation to the Irradiated Volume among 115 Patients Treated for Childhood Cancer.
]. This differs from adult dose–response studies, which tend to focus on high-dose volumes [
• ASTRO
Quantitative analyses of normal tissue effects in the clinic (QUANTEC).
].
Many of these unique challenges also complicate the applicability of IBDM to data from children. Although Palma et al. [
• Palma G.
• Monti S.
• Pacelli R.
• Liao Z.
• Deasy J.O.
• Mohan R.
• et al.
Radiation Pneumonitis in Thoracic Cancer Patients: Multi-Center Voxel-Based Analysis.
] recently demonstrated the utility of IBDM for adult multi-center studies, the relatively larger and age-related anatomic variability characteristic of childhood, which may result in larger uncertainties associated with the spatial normalization needed for IBDM, remained an outstanding question. A recent study investigated this spatial normalization in CT images from a small cohort of children and suggested that satisfactory normalization is feasible [
• Veiga C.
• Lim P.
• Anaya V.M.
• Chandy E.
• D’Souza D.
• et al.
Atlas construction and spatial normalisation to facilitate radiation-induced late effects research in childhood cancer.
].
Many childhood cancers occur in the brain, where MRI is often used because of its superior soft-tissue contrast. However, CT-based analyses remain useful thanks to their direct relevance to radiotherapy treatment planning data. In addition, because large pediatric cohorts are rare and require accrual over many years, the heterogeneity of MR protocols and image quality across time or between centers in multi-institution studies, as are needed to increase small pediatric cohort sizes, present other challenges to the IBDM workflow. Hence, a logical next step will be to develop IBDM using MRI to improve soft-tissue spatial normalization in the brain [
• Monti S.
• Paganelli C.
• Buizza G.
• Preda L.
• Valvo F.
• Baroni G.
• et al.
A novel framework for spatial normalization of dose distributions in voxel-based analyses of brain irradiation outcomes.
,
• Skaarup M.
• Lundemann M.J.
• Darkner S.
• Jørgensen M.
• Marner L.
• Mirkovic D.
• et al.
A framework for voxel-based assessment of biological effect after proton radiotherapy in pediatric brain cancer patients using multi-modal imaging.
] but the feasibility of the full IBDM pipeline in data from children must first be established, and CT offers valuable information to this effect. For example, we must first determine whether residual uncertainties (due to large anatomic variations) from the normalization process affect the ability of IBDM to identify a small “sensitive” sub-volume in a large population of children. The specific consequences of the characteristics of the dose distribution (e.g., dose gradient) are also unknown. Quantifying those uncertainties is important because they will inform sample-size and power calculations in future studies of the late effects of radiotherapy in children.
Here, we present a comprehensive end-to-end test of IBDM in CT data obtained from a large population of children for a simulated effect. We tested the performance of a published binary IBDM approach and quantified the role of three key steps in IBDM: selection of reference anatomy, sensitivity to the dose-distribution characteristics (using both simplified and modified clinical dose distributions), and robustness to the incidence rate (i.e., relative frequency) of the effect of interest in the population by using incidence rates representative of side effects associated with cranial irradiation in children.

## 2. Methods

We begin this section by defining our patient population and hypotheses (Section 2.1) before, for the reader’s convenience, providing a brief description of the published IBDM method that we implemented in this study (Section 2.2). The sections that follow (2.3 to 2.6) describe our comprehensive end-to-end test in detail.

### 2.1 Population

We tested the applicability of IBDM to pediatric studies by using data from a cohort of 167 children. These children previously received proton radiotherapy for primary craniopharyngioma brain tumors at St. Jude Children’s Research Hospital. Craniopharyngioma is a centrally located tumor arising in the suprasellar region of the intracranial compartment. We selected this cohort for its wide age range (10 months to 20 years), even sex distribution, and well-defined treatment protocol (Table 1). The patient data used in this study included their treatment-planning CT images, structure sets approved by a board-certified radiation oncologist, and clinically delivered radiotherapy dose distributions exported from a commercial treatment-planning system (Eclipse v13.7, Varian Medical Systems, Palo Alto, CA). All patients received focal proton therapy comprising two (lateral opposed) or three (lateral opposed plus apex) beams. All treatment-planning CT images were acquired in the head-first supine orientation with pixels ranging in size from 0.78 to 1.17 mm. All but one image (1.5 mm thickness) had a slice thickness of 1 mm. Structure sets for all patients included delineations of the target volumes and five organs at risk: brainstem, left cochlea, right cochlea, left optic nerve and right optic nerve. All structures were contoured according to the RT2CR (NCT01419067) or RT3CR (NCT02792582) clinical trial protocols. This study was approved by the St. Jude Institutional Review Board.
Table 1Demographic and treatment details of children included in this study.
CharacteristicValue
No. patients167
No. males (%)83 (49.7)
Mean age ± 1σ [y]9.6 ± 4.6
Mean target volume ± 1σ [mm3]26.3 ± 19.1
Prescribed dose [GyRBE]54
Prescribed no. fractions30
No. who received chemotherapy (%)0 (0)
No. who received surgery (%)167 (100)
We used these data to test the hypothesis that it is feasible to use IBDM with data obtained from children to identify a sensitive sub-volume in which a systematic difference in dose deposition is associated with a binary effect of interest.

### 2.2 IBDM methodology

We performed the two primary steps of IBDM analyses by using the framework and software developed at the University of Manchester [
• Chen C.
• Witte M.
• Heemsbergen W.
• van Herk M.
Multiple comparisons permutation test for image based data mining in radiotherapy.
,
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
,
• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
,
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
,
• McWilliam A.
• Dootson C.
• Graham L.
• Banfill K.
• Abravan A.
• van Herk M.
Dose surface maps of the heart can identify regions associated with worse survival for lung cancer patients treated with radiotherapy.
].

#### 2.2.1 Spatial normalization

Spatial normalization in IBDM aims to map individual patient anatomy and the related treatment information (e.g., dose distributions) to a common reference anatomy. IBDM generally performs this normalization through a two-step process: rigid, or affine image registration to grossly align each patient to the reference anatomy, followed by deformable image registration (DIR) to fine-tune the normalization. We implemented three iterations of the rigid registration, allowing scaling only, then translation only, and then translation and rotation. This preliminary alignment only considers the anatomy within a region of interest that is automatically generated according to clinically delineated structures. We defined our region of interest as the bounding box encompassing the clinically delineated brain contour expanded by a 5 mm margin. Next, the IBDM DIR uses a B-spline DIR algorithm in the Nifty Registration package (NiftyReg, KCL, UK) [
• Modat M.
• Ridgway G.R.
• Taylor Z.A.
• Lehmann M.
• Barnes J.
• Hawkes D.J.
• et al.
Fast free-form deformation using graphics processing units.
] with the default registration parameters and regularization via a bending-energy penalty term. This step considers the entire patient image (i.e., anatomy both within and outside the region of interest).
The deformation matrix that results from the anatomic spatial normalization is applied to the associated radiotherapy treatment data (e.g., dose distributions and structure delineations) to obtain versions of each individual’s radiotherapy treatment data propagated to the reference anatomy. IBDM aims to take the uncertainties associated with the spatial normalization into account during the voxelized dose comparison by applying a 3D Gaussian blurring kernel to each deformed dose distribution prior to performing the dose comparison. IBDM defines this blurring kernel by using the standard deviation (SD) in the center-of-mass locations of structures that are routinely delineated and within the registration region of interest. Specifically, it averages the SD in each of the superior-inferior, anteroposterior, and left–right directions across all delineated structures. This SD is calculated relative to the center of mass of each corresponding structure in the reference anatomy. In our study, the clinically delineated structures within the registration region of interest that contributed to this 3D Gaussian blurring kernel were the brainstem, left and right cochlea, and left and right optic nerves.

#### 2.2.2 Voxelized dose comparison

In binary voxel-wise dose comparison, patients are grouped according to a binary effect measure, namely those whose data report an effect of interest in one group (E for “effect”) and those without the effect in the other group (NE for “no effect”). The Manchester Tool Kit (MTK) for binary analysis [
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
,
• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
] calculates a Student t-test statistic in each voxel of the reference anatomy, resulting in a 3D distribution of t-test statistic magnitudes, also referred to as a t-map. The MTK then performs a permutation test to identify the t-values associated with random discrepancies between the radiation dose distributions of the two groups of patients. The permutation test is a nonparametric test [
• Camargo A.
• Azuaje F.
• Wang H.
• Zheng H.
Permutation - based statistical tests for multiple hypotheses.
] that conservatively corrects for multiple comparisons to protect against erroneously significant results, which are likely given the large sample size [
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
,
• Dudoit S.
• Shaffer J.P.
• Boldrick J.C.
Multiple Hypothesis Testing in Microarray Experiments.
,
• Belmonte M.
• Yurgelun-Todd D.
Permutation testing made practical for functional magnetic resonance image analysis.
,
• Groppe D.M.
• Urbach T.P.
• Kutas M.
Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review.
]. Permutation testing randomizes the binary effect labels among the patients in the cohort several times (1,000 times in our study). For each permutation, MTK calculates a new t-map and records the most extreme positive and negative t-test statistics [
• Chen C.
• Witte M.
• Heemsbergen W.
• van Herk M.
Multiple comparisons permutation test for image based data mining in radiotherapy.
,
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
], considering all voxels within the reference anatomy skin contour (i.e., including anatomy within and outside the registration region of interest). This procedure tests the null hypothesis that there is no difference between the average dose distributions of the two groups ($H0:DE¯=DNE¯;H1:DE¯≠DNE¯$). Ranking the recorded extreme t-statistics for all permutations enables one to assess the significance of the observed t-statistic (i.e., that obtained from the true effect labels, before random permutation). We determined statistical significance at the α = 0.01 significance level, which is common in permutation tests to control for the family-wise error rate [
• Groppe D.M.
• Urbach T.P.
• Kutas M.
Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review.
]. Practically, this entails comparing the 99th percentile in the ranked extreme t-statistics with the observed t-statistic. Additionally, the MTK indicates volumes in which the dose relates to the investigated effect by using the ranked extreme t-statistics for various p-values. The volumes are found by segmenting iso-t levels in the observed t-map. The MTK functionality is described in detail elsewhere [
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
].

### 2.3 Assessing the impact of the reference anatomy

We selected the reference anatomy from among the treatment planning CTs of the children in the cohort. To assess the sensitivity of the IBDM analysis to the selection of the reference anatomy, we considered three selection methods: the most representative among the ages, brain volumes, and target volumes. We determined representativeness as the median value because it is less susceptible to outliers than is the mean. Specifically, we selected images from the female and male patients closest to the median age, the patient with the median brain volume, and the patient with the median target volume.

### 2.4 Evaluation of spatial normalization

We visually assessed the performance of the DIR by calculating the mean CT of all registered images for each reference image. Additionally, we evaluated the performance of the DIR via surface-distance and center-of-mass analyses for all delineated structures. For surface-distance analysis, we calculated two distance metrics: the 95% Hausdorff ($DTH,95$) [
• Kim H.S.
• Park S.B.
• Lo S.S.
• Monroe J.I.
• Sohn J.W.
Bidirectional local distance measure for comparing segmentations.
] and average ($DTavg$) distances as,
$DTH,95=maxDTH,95:t,s→t,DTH,95:s→t,t$
(1)

$DTavg=avgDTavg:t,s→t,DTavg:s→t,t$
(2)

respectively, where t represents the reference-anatomy contour and $s→t$ represents the individual subject’s contour propagated to the reference anatomy [
• Veiga C.
• Lim P.
• Anaya V.M.
• Chandy E.
• D’Souza D.
• et al.
Atlas construction and spatial normalisation to facilitate radiation-induced late effects research in childhood cancer.
]. For center-of-mass analysis, we calculated the Euclidian norm of the center-of-mass displacement ($ΔCOM$), or.
$ΔCOM=COMs→t-COMt$
(3)

where t and $s→t$ are defined as above. We calculated the $ΔCOM$ value for each patient and then averaged it for all patients ($ΔCOMavg$).

### 2.5 Simplified dose distributions

To assess the strengths and limitations of IBDM analyses of the pediatric data, we used simple simulated 3D radiation dose distributions inspired by the methods of Chen et al. [
• Chen C.
• Witte M.
• Heemsbergen W.
• van Herk M.
Multiple comparisons permutation test for image based data mining in radiotherapy.
]. By using simulated dose distributions, we could choose the location and size of the sensitive volume of interest and maintained full control of the dosimetric characteristics within and outside the volume of interest. This enabled us to systematically test and quantify the sensitivity of IBDM to the various dose distribution characteristics it may encounter in clinical assessments.
We generated the simplified simulated dose distributions for each individual patient’s anatomy prior to spatial normalization with an in-house code written in MATLAB (vR2019a, MathWorks, Natick, MA) and tools included in the Computational Environment for Radiotherapy Research (version 5.2) [
• Deasy J.O.
• Blanco A.I.
• Clark V.H.
CERR: A computational environment for radiotherapy research.
]. We defined these dose distributions in relative dose units and designed them to simulate analyses of out-of-field dose, as is highly relevant to pediatric late effects, such as second cancers. The test and control dose distributions comprised a homogeneous background dose of 100% assigned everywhere within each patient’s skin contour. The test dose distributions additionally contained a cubic test volume of side length $S$ centered in each patient’s brain contour. The dose in this test volume was higher than the background dose level by an amount $ΔD$. This test volume mimics a sensitive sub-volume in which the dose is associated with an effect of interest. Out-of-field dose is not explicitly calculated by commercial treatment planning systems and so it is often estimated for research purposes using Monte Carlo methods [
• Newhauser W.D.
• Schneider C.
• Wilson L.
• Shrestha S.
• Donahue W.
A review of analytical models of stray radiation exposures from photon- and proton-beam radiotherapies.
]. However, these simulations suffer from statistical noise, especially outside of the therapeutically irradiated volume. Therefore, each test and control dose distribution included a unique additive random noise component (Fig. 1). We determined the random noise on a voxel-by-voxel basis by randomly sampling from a Gaussian distribution (μ = 0; σ = $σnoise$).
To assess the limits of IBDM functionality for pediatric analyses, we varied these simple simulated dose distributions based on three distinct parameters. Specifically, we generated test dose distributions for all combinations of three noise levels ($σnoise$), three test-volume side lengths ($S$), and three dose deviations between the test volume and background dose level ($ΔD$). This produced 27 test conditions. We additionally generated control dose distributions for each of the three $σnoise$ levels, resulting in a total of 30 simulated dose distributions for each patient.
The three noise levels were defined by 1.96 $σnoise$ = 2%, 5%, and 7% (Fig. 2). We selected these levels to span the noise and uncertainty levels representative of out-of-field dose calculations from Monte Carlo simulations [
• Wilson L.J.
• Pirlepesov F.
• Moskvin V.
• Li Z.
• Guo Y.
• Li Y.
• et al.
Proton therapy delivery method affects dose-averaged linear energy transfer in patients.
] to analytical algorithms [
• Wilson L.J.
• Newhauser W.D.
• Schneider C.W.
• Kamp F.
• Reiner M.
• Martins J.C.
• et al.
Method to quickly and accurately calculate absorbed dose from therapeutic and stray photon exposures throughout the entire body in individual patients.
,
• Hauri P.
• Hälg R.A.
• Besserer J.
• Schneider U.
A general model for stray dose calculation of static and intensity-modulated photon radiation.
]. The three test-volume side lengths were $S$ = 0.5 cm, 2.0 cm, and 6.0 cm (Fig. 2). These side lengths were selected to span potential critical-structure volumes on the order of the cochlea to the individual lobes of the brain. The three dose deviations were $ΔD$ = 0.1%, 1%, and 5% (Fig. 2). We selected these relative magnitudes to span from stray-dose magnitudes [
• Wilson L.J.
• Newhauser W.D.
• Schneider C.W.
• Kamp F.
• Reiner M.
• Martins J.C.
• et al.
Method to quickly and accurately calculate absorbed dose from therapeutic and stray photon exposures throughout the entire body in individual patients.
] to clinically acceptable therapeutic dose deviations [
• Moyers M.F.
• Toth T.L.
• Chvetsov A.V.
• Unkelbach J.
• Mohan R.
• et al.
202 - Physical Uncertainties in the Planning and Delivery of Light Ion Beam Treatments.
].

#### 2.5.1 Simulated effects

Binary IBDM analysis requires that patient data be associated with a binary effect label. For this computational test, we assigned each patient’s data a label of Effect (E) or No Effect (NE). Patient data assigned the E label included a test dose distribution (with a higher-dose cube of a certain size, $S$), whereas data assigned the NE label included the control dose distribution (without cube). We only considered E–NE groupings sharing the same level of noise, i.e., both E and NE dose distributions were generated with the same $σnoise$ such that the only systematic difference between dose distributions in the E and NE groups was the presence (or absence) of the test volume.
We assessed the sensitivity of the pediatric IBDM analyses to two characteristics of the binary patient labeling: the relative frequency of the simulated effect (i.e., the relative proportion of E labels among the cohort) and the individual patients who experienced the simulated effect (i.e., the specific patient data assigned the E label). We considered three relative frequencies of E, representing three different incidence rates of the simulated effect of interest: 50%, 20%, and 10%. We considered the relative frequency of 50% because this provides the highest level of statistical power. We selected the 20% and 10% frequencies to approximate the rates of subsequent neoplasms [
• Friedman D.L.
• Whitton J.
• Leisenring W.
• Mertens A.C.
• Hammond S.
• Stovall M.
• et al.
Subsequent Neoplasms in 5-Year Survivors of Childhood Cancer: The Childhood Cancer Survivor Study.
] and vasculopathies [
• Kralik S.F.
• Watson G.A.
• Shih C.S.
• Ho C.Y.
• Finke W.
• Buchsbaum J.
Radiation-Induced Large Vessel Cerebral Vasculopathy in Pediatric Patients With Brain Tumors Treated With Proton Radiation Therapy.
,
• Lucas J.
• LeVine D.
• Ismael Y.
• Hsu C.-Y.
• Darrow K.
• Faught A.
• et al.
Rare-12. Vasculopathy in pediatric craniopharyngioma patients treated with surgery and radiotherapy.
], respectively, which are two effects of interest among survivors of childhood brain cancer who received proton therapy.
We randomly assigned the E or NE labels to the patient data by using the random number generator in MATLAB vR2019a. To generate the 50% frequency labeling, we randomly assigned the E or NE label to the patient data with equal probability, resulting in 84/167 assigned the NE label and 83/167 assigned the E label. Similarly, to generate the 10% grouping, we assigned the E label to 10% of the cohort at random, and the remainder received the NE label. To define the labels for the 20% relative frequency and enable a test for sensitivity to individual patient status (i.e., E or NE labeling), we followed methods similar to those of k-fold cross validation [
• Anguita D.
• Ghio A.
• Ridella S.
• Sterpi D.
K-fold cross validation error rate estimate in support vector machines.
]. First, we randomly partitioned the patient data into five subsections. We then generated five unique binary label combinations by assigning patient data in one subsection the E label and assigning the remaining patients the NE label. In this way, the patient data in each of the five subsections received the E label in one label combination. For each grouping strategy, we ensured that age and sex did not significantly differ by performing a Student t-test for age and a Pearson χ2 test for sex.

#### 2.5.2 Evaluation of dose comparison

The goal of the IBDM dose comparison was to identify the presence and location of the dose discrepancy between test and control dose distributions that is related to a simulated effect (i.e., identify and locate the test volume representing a sensitive sub-volume). To assess the sensitivity to the reference anatomy selection, we performed IBDM analyses for all four reference anatomies with the effect labels characterized by a 50% relative frequency of E and NE labels. To test the sensitivity to the relative frequency of simulated effects, we performed additional IBDM analyses with the best-performing reference anatomy for the 10% and 20% relative frequency label configurations. To test the sensitivity to the individual patient status (i.e., E or NE labeling), we performed IBDM analyses with the best-performing reference anatomy for all five folds of the 20% relative frequency label configuration. Finally, to test the sensitivity to dosimetric characteristics, we repeated all above-described tests for all 27 test configurations (Fig. 3). This resulted in a total of 270 IBDM analyses.
We quantified the performance of the IBDM dose comparison in each of these 270 analyses by calculating the Dice similarity coefficient (DSC) [
• Dice L.R.
Measures of the Amount of Ecologic Association Between Species.
] of the reference patient’s test volume ($VT$) compared to the 99th percentile iso–t-map volume ($V99th$), calculated as.
$DSC=2×VT∩V99thVT+V99th$
(4)

#### 2.5.3 Statistical sensitivity tests

We tested whether the results of the IBDM dose comparisons are associated with several components of the analysis method via multifactor ANOVA models with post-hoc Bonferroni (Dunn) t-tests. We fitted three separate multifactor ANOVA models (Fig. 3). The first model included all analyses performed with the 50% relative frequency and assessed whether the results differed based on the reference-anatomy selection and the three defining features of the test dose distributions (i.e., $S$, $ΔD$, and $σnoise$). The second model included all IBDM analyses performed with the 20% relative frequency and assessed whether the results differed based on adjustments in the individual patient effect status (i.e., the folds). The final ANOVA model included the IBDM analyses performed for all three relative frequencies with the best-performing reference anatomy and assessed differences in the results based on adjustments to the relative frequency (i.e., incidence rate of the simulated effect). All statistical tests were performed with SAS version 9.4 (SAS Institute, Cary, NC).

### 2.6 Modified clinical dose distributions

Although the simple simulated dose distributions enabled us to interrogate specific features of the dose distributions and their effects on IBDM performance when analyzing pediatric data, they served a primarily computational purpose by mimicking only analyses far from the therapeutic treatment field, which are of relevance to late effects like second cancer [
• Diallo I.
• Samand A.
• Quiniou E.
• Chavaudra J.
• et al.
Frequency Distribution of Second Solid Cancer Locations in Relation to the Irradiated Volume among 115 Patients Treated for Childhood Cancer.
] and heart disease [
• Darby S.C.
• Ewertz M.
• McGale P.
• Bennet A.M.
• Blom-Goldman U.
• Brønnum D.
• et al.
Risk of Ischemic Heart Disease in Women after Radiotherapy for Breast Cancer.
,

Bates JE, Shrestha S, Liu Q, Mulrooney DA, Smith SA, Leisenring WM, et al. Low-dose radiation to cardiac substructures and late-onset cardiac disease: A report from the Childhood Cancer Survivor Study (CCSS). J Clin Oncol. 2021;39:10027.

]. Other effects of interest in childhood survivorship, like cognitive deficits, appear to be related to exposures to tissues near the targeted volume [
• Acharya S.
• Guo Y.
• Patni T.
• Li Y.
• Wang C.
• Gargone M.
• et al.
Association Between Brain Substructure Dose and Cognitive Outcomes in Children With Medulloblastoma Treated on SJMB03: A Step Toward Substructure-Informed Planning.
]. In these cases, IBDM analyses must contend with spatially correlated noise (i.e., entrance doses) to identify anatomic volumes in which a threshold dose level separates the dose distributions of patients who experienced an effect of interest from those who did not. To more accurately mimic this situation, we performed a test based on the planned dose distributions from the radiotherapy treatments of the 167 children in the cohort.
Quantifying and assessing the performance of IBDM requires dose distributions that contain a systematic dose difference in a predefined anatomic volume, which correlates with the simulated effect labels (i.e., a known dose difference in a known volume from which the ability of IBDM to identify that volume can be quantified and judged), as we achieved by using the simple simulated dose distributions. To meet these requirements while maintaining clinical realism, we modified the clinical dose distributions by replacing the dose inside each patient’s brainstem contour with the average magnitude of the dose distribution in that patient’s brainstem (Fig. 4). We chose the brainstem as the volume of interest for this experiment because of its close proximity to the target volume, making it a particularly challenging case with which to assess the IBDM method’s sensitivity to spatially correlated noise. Finally, we smoothed the resulting dose distribution with a Gaussian smoothing kernel of σ = 1.0 mm to achieve clinically realistic dose gradients (for proton therapy) around the edges of the brainstem. By these methods, we obtained dose distributions with clinically realistic therapeutic exposures and consistent dose discrepancies in a predefined volume (i.e., the brainstem).
We separated the patient data into new Effect (E) and No Effect (NE) groups for this analysis according to a threshold level among the average brainstem dose magnitudes. We tested IBDM performance with modified clinical dose distributions at the same relative frequencies of E labels as we used for the simplified dose distributions: 50%, 20%, and 10%. We assigned the effect labels by determining the three corresponding threshold dose levels: the median ($D∼$), 80th percentile ($D80th$), and 90th percentile ($D90th$) doses (Fig. 5). We assigned all patient data whose average brainstem doses were equal to or below the threshold to the NE group and all whose average brainstem doses were above the threshold to the E group.
This portion of the study aimed to assess the ability of IBDM to recognize and locate a dose discrepancy according to a threshold level in the presence of clinically realistic, spatially correlated noise (i.e., therapeutic and entrance doses). Therefore, we quantified the performance of IBDM by comparing the reference patient’s brainstem contour to the 99th percentile iso–t-map volume via three metrics: sensitivity, or the fraction of the reference brainstem covered by the 99th percentile iso–t-map volume; positive predictive value (PPV), or the fraction of the 99th percentile iso–t-map located within the reference brainstem, and the DSC (see Equation (4)).

## 3. Results

### 3.1 Evaluation of spatial normalization

Fig. 6 shows the original reference CT images and average treatment-planning CT images at two distinct points in the IBDM workflow: before and after DIR to the best-performing reference anatomy. Qualitatively, the improved clarity after DIR, especially in the regions shown with the soft-tissue window-level depicting the finer brain structure, indicated lower intensity variability among the mapped images and an overall high accuracy of the DIR, despite large age-related variations in anatomy. The corresponding images for all four reference patients (see Section 2.3) are supplied in Appendix A. Quantitatively, Table 2 shows the metrics computed in surface-distance and center-of-mass analyses. These metrics revealed that the discrepancies between the individual subject anatomy propagated to the reference coordinate system and the reference anatomy were small (i.e., $DTH,95$ < 9 mm, $DTavg$ < 3 mm, and $ΔCOM$ < 6 mm), with a slight improvement for the reference anatomy selected on the basis of the median brain volume. Relatedly, Table 3 shows the dimensions of the Gaussian blurring kernel used to take the effects of DIR uncertainties into account in the voxelized dose comparison for all four reference anatomies. These dimensions were also small (i.e., SD < 4 mm).
Table 2Quantitative Evaluation of Deformable Image Registration. Measures of the deformable-image-registration accuracy include the 95% Hausdorff ($DTH,95$) and average ($DTavg$) distances between surfaces (see Eqs. (1), (2)) and the Euclidian norm of the center of mass displacement ($ΔCOMavg$) (see Equation 3). All metrics are reported as the value ± one standard deviation. Bold numbers show the lowest metric (i.e., best performance) for each row.
Reference Anatomy Selection Method
Volume of InterestMetricAge (Male)Age (Female)Brain VolumeTarget Volume
Brainstem$DTH,95$ mm5.44 ± 2.516.95 ± 1.524.78 ± 1.686.99 ± 2.74
$DTavg$ mm2.15 ± 1.062.79 ± 0.601.90 ± 0.612.44 ± 0.95
$ΔCOM$ mm2.60 ± 2.394.42 ± 1.042.51 ± 1.543.19 ± 1.66
Left Cochlea$DTH,95$ mm2.10 ± 3.162.57 ± 2.181.59 ± 0.691.54 ± 1.02
$DTavg$ mm1.05 ± 2.830.99 ± 1.900.68 ± 0.460.71 ± 0.66
$ΔCOM$ mm1.44 ± 3.161.58 ± 2.230.96 ± 0.801.00 ± 1.16
Right Cochlea$DTH,95$ mm1.99 ± 3.121.89 ± 1.231.80 ± 1.051.98 ± 2.69
$DTavg$ mm1.02 ± 2.900.72 ± 0.890.76 ± 0.651.05 ± 2.34
$ΔCOM$ mm1.26 ± 3.221.09 ± 1.210.89 ± 0.961.30 ± 2.80
Left Optic Nerve$DTH,95$ mm8.06 ± 4.846.50 ± 3.075.03 ± 3.144.80 ± 2.30
$DTavg$ mm2.19 ± 2.862.17 ± 2.281.57 ± 1.361.72 ± 1.50
$ΔCOM$ mm3.58 ± 3.575.52 ± 3.312.97 ± 2.062.70 ± 1.89
Right Optic Nerve$DTH,95$ mm8.12 ± 4.845.63 ± 2.615.22 ± 3.584.86 ± 2.35
$DTavg$ mm2.30 ± 2.531.83 ± 1.761.55 ± 1.301.80 ± 1.51
$ΔCOM$ mm3.53 ± 3.294.60 ± 3.002.99 ± 2.222.87 ± 2.07
Table 3Dimensions of the Gaussian Blurring Kernel. Dimensions used to take the effects of uncertainties associated with mapping individual patient anatomies to a reference anatomy into account for each of the four reference anatomies considered. Bold values represent the smallest among the four reference patients.
Reference AnatomiesAnteroposterior (mm)Left-Right (mm)Superior-Inferior (mm)
Median Age (F)2.672.632.11
Median Age (M)3.251.682.02
Median Brain Volume1.851.471.34
Median CTV1.661.931.90
Abbreviations: F – female; M – male; CTV – clinical target volume

### 3.2 Evaluation of dose comparison

The various strategies of assigning E and NE labels to patient data did not change the distribution of age and sex (p > 0.05 for all). IBDM analyses among the 270 test cases were largely successful. Of the 270 tests, 219 (81%) identified a family-wise significant difference between the dose distributions of the E and NE groups at the α = 0.01 significance level. All comparisons that failed the family-wise test (i.e., 51 of 270) involved either the smallest test volume (31 of 51, 60%) (i.e., the volume simulating a small sensitive sub-volume of interest; $S$ = 0.5 cm) or the smallest introduced dose difference (49 of 51, 96%) (i.e., $ΔD$ = 0.1%), or both (29 of 51, 57%). DSC analyses from voxel-wise comparisons revealed a peak in frequency at DSC magnitudes between 0.9 and 1. Fig. 7 shows representative examples of the iso–t-maps for IBDM comparisons in simple dose distributions, along with the corresponding DSC magnitudes. These simple dose distributions were characterized by a dichotomous dose distribution with a constant average dose difference in the test volume and no average dose difference elsewhere. Therefore, the iso–t-maps are dichotomized and lack a smooth transition through significance levels. Appendix B contains the corresponding images for all 270 tests.

### 3.3 Statistical sensitivity tests

Fig. 8 shows violin plots of DSC magnitudes from all IBDM analyses (i.e., including those in which the family-wise comparison failed to recognize a difference) that were partitioned by the various parameters considered in the sensitivity tests. Fig. 8a-d show the results of the sensitivity tests included in ANOVA 1 (shown in Fig. 3). The IBDM results, as quantified by DSC, were insensitive to the method of reference-anatomy selection (Fig. 8a). This suggests that these methods are robust to the method of reference-anatomy selection despite large variations in age-related anatomy within the cohort. The DSC magnitudes were also sensitive to the test-volume side length ($S$) (Fig. 8b), insensitive to the magnitude of the dose discrepancy for $ΔD$ greater than or equal to 1% (Fig. 8c) and significantly improved for the lowest noise level of 2% (Fig. 8d). Fig. 8e shows the results for ANOVA 2 (shown in Fig. 3). This plot reveals that the IBDM results were insensitive to the individual patient labels. This suggests that the IBDM method reliably assesses dosimetric discrepancies and is robust to inter-individual anatomic variations in children. Finally, Fig. 8f shows the results for ANOVA 3 (shown in Fig. 3). This plot indicates that the IBDM results were significantly worse with the 10% relative frequency of the E labels but were there was no difference between 20% and 50% relative frequencies. This suggests that a lower bound exists for the incidence rate of effects of interest that are suitable for this type of analysis, i.e., 36 events for 20%.

### 3.4 Modified clinical dose distributions

Table 4 lists the calculated metrics comparing the 99th percentile iso–t-volume to the reference brainstem from all three tests in modified clinical dose distributions. These values show that IBDM successfully identified systematic differences between the dose distributions of the E and NE groups with a family-wise p ≤ 0.001 in all cases. The calculated sensitivities reveal that IBDM consistently recognized the dose differences present in the brainstem, even in the test with the lowest power (i.e., relative frequency of 10%). The PPV and DSC, however, were relatively lower, only reaching a maximum of 0.25 and 0.40, respectively. Fig. 9 shows the iso–t-maps for the three tests in modified clinical dose distributions. These images show that the 99% iso–t-volume covered the brainstem (i.e., the volume in which we introduced a systematic dose difference between the E and NE groups) despite the presence of clinically realistic, spatially correlated noise from the therapeutic dose distribution. This significant volume, however, also encompassed additional voxels outside of the brainstem due to correlations of the brainstem dose with systematic differences in the target coverage and beam arrangement for the two treatment protocols included in this study. Nevertheless, the most critical point is that the IBDM method must identify the entire sensitive sub-volume and not miss part of the volume of interest because of unavoidable noise. Interestingly, the DSC goes up with lower event rate, showing the potential of IBDM to overestimate sensitive regions because of correlations in the data.
Table 4Quantitative Evaluation in Modified Clinical Dose Distributions. Table of metrics quantifying the performance of the image-based data mining analysis in modified clinical dose distributions. Metrics include the family-wise p-value, sensitivity, positive predictive value (PPV), and dice similarity coefficient (DSC, see Eq. 4) comparing the 99th percentile iso-t-volume to the brainstem of the reference anatomy. Higher magnitudes of sensitivity, PPV, and DSC indicate improved performance with 1.00 being the maximum attainable magnitude for all three metrics.
Relative Frequency (%)pSensitivityPPVDSC
50<0.0011.000.040.07
20<0.0011.000.100.18
10<0.0010.980.250.40

## 4. Discussion

We tested the applicability of IBDM to pediatric data by using the imaging and treatment data from a cohort of 167 children who previously received proton therapy for primary brain tumors at St. Jude Children’s Research Hospital. We assessed the strengths and limitations of IBDM with simulated effects and simple and modified clinical dose distributions. Our findings revealed that IBDM, for a structure that is central in the brain, is applicable to pediatric data despite large age-related anatomic variation and the small patient numbers and incidence rates characteristic of childhood late-effects research. Our findings also show that IBDM reliably identifies systematic dose differences as small as 1%, regardless of the selection of reference anatomy. The performance, however, depends on the magnitude of random noise, the relative frequency of the effect of interest in the cohort, and the size of the sensitive sub-volume (i.e., the side length, $S$, of the test volume in this study). It should be noted, however, that the DSC itself is sensitive to the volume being compared with lower DSC magnitudes more likely for smaller volumes. Therefore, although our IBDM results were sensitive to $S$, the extent to which the apparent sensitivity is attributable to the IBDM method rather than the inherent nature of DSC is unclear.
Although our study is the first to evaluate the feasibility of the complete IBDM workflow in pediatric CT data, the spatial normalization step can be compared to previous CT-based studies implementing IBDM in adult cohorts of lung cancer patients [
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
,
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
,
• McWilliam A.
• Dootson C.
• Graham L.
• Banfill K.
• Abravan A.
• van Herk M.
Dose surface maps of the heart can identify regions associated with worse survival for lung cancer patients treated with radiotherapy.
], head and neck cancer patients [
• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
] and a recent study evaluating spatial normalization in data from children [
• Veiga C.
• Lim P.
• Anaya V.M.
• Chandy E.
• D’Souza D.
• et al.
Atlas construction and spatial normalisation to facilitate radiation-induced late effects research in childhood cancer.
]. Beasley et al. found a DIR uncertainty (SD of center of mass) of 4.0, 0.34 and 0.30 mm in the left–right, anteroposterior and superior-inferior directions for the muscles of mastication [
• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
]. We found similar DIR uncertainty values for our selected brain structures, with slightly less deviation, most likely because of the rigidity of structures contained within the skull. Veiga et al. found $DTavg$ and $ΔCOMavg$ values of 0.7 ± 1.0 mm and 2.9 ± 5.7 mm, respectively, for DIR of the central nervous system in children [
• Veiga C.
• Lim P.
• Anaya V.M.
• Chandy E.
• D’Souza D.
• et al.
Atlas construction and spatial normalisation to facilitate radiation-induced late effects research in childhood cancer.
]. Although our $ΔCOMavg$ was consistent with that of Veiga et al., we found a slightly higher $DTavg$ (Table 1). It should be noted, however, that we calculated $DTavg$ as an unsigned average and it is unknown whether the metric reported by Veiga et al. is signed or unsigned. Furthermore, we assessed DIR accuracy by using contours, which are widely known to depend on inter-observer variation [
• Wright J.L.
• Yom S.S.
• Awan M.J.
• Dawes S.
• Fischer-Valuck B.
• Kudner R.
• et al.
Standardizing Normal Tissue Contouring for Radiation Therapy Treatment Planning: An ASTRO Consensus Paper.
]. The effect of this inter-observer variability is somewhat less severe in $ΔCOMavg$ assessment but has a strong effect on $DTavg$ and may also contribute to the differing results from Veiga et al. and our study.
Our study has several notable strengths. First, it represents the first end-to-end test of voxel-based analysis methods in a pediatric cohort. Our study also used well-characterized data from a large cohort of children with cancer. The data were collected under two successive clinical trials, thereby minimizing potential confounding factors related to imaging protocols, structure contours, and treatment variations. We also minimized confounding factors by using simulated dose distributions. This yielded complete control of the location, volume, and magnitude of dose deviations analyzed, which is a necessary step to objectively quantify the performance of the IBDM method and assess its sensitivity to various dosimetric characteristics.
Our study also had some limitations. First, our use of default DIR algorithm parameters may have limited the achievable DIR accuracy. Our metrics assessing DIR accuracy, however, were comparable to those in other IBDM studies. Future studies are needed to identify the optimal DIR parameters for pediatric IBDM analyses. Additionally, the ability of the registration algorithm to accurately match structures within the brain may be further limited by the lack of soft tissue contrast in CT imaging. This lack of contrast may also account for the small number of outliers (4/167 patients) that we observed with large registration discrepancies in which no obvious reason for registration failure was apparent. We precisely assessed the DIR accuracy, however, by propagating the spatial normalization to structure contours that spanned the brain volume, thereby minimizing the likelihood of overlooking gross deformation errors and still found a reasonable registration accuracy (< 6 mm in the best-performing reference), despite the low soft-tissue contrast of CT. Furthermore, if a few cases within a large cohort lead to registration failures, those cases can be discarded without affecting the statistical power of the analysis. Incorporating MR or other multi-modal imaging techniques would increase the information available to the registration algorithm and would enable a more comprehensive assessment of the registered brain structures, and likely a higher registration accuracy, as has been suggested by two methodological studies [
• Monti S.
• Paganelli C.
• Buizza G.
• Preda L.
• Valvo F.
• Baroni G.
• et al.
A novel framework for spatial normalization of dose distributions in voxel-based analyses of brain irradiation outcomes.
,
• Skaarup M.
• Lundemann M.J.
• Darkner S.
• Jørgensen M.
• Marner L.
• Mirkovic D.
• et al.
A framework for voxel-based assessment of biological effect after proton radiotherapy in pediatric brain cancer patients using multi-modal imaging.
] using smaller cohorts (i.e., 50 adults and 6 children, respectively). However, the additional required image registration step (i.e., MR to CT) would also add additional uncertainty. Despite this limitation of CT-based IBDM, our results confirm the feasibility of applying IBDM in data from children, for the first time, and provide a reference point for future studies evaluating IBDM using MR images. We are currently generalizing the methods reported here for MR-based spatial normalization to perform further IBDM studies in children with brain tumors. Another limitation of this study is its computational nature, which only addresses the feasibility of IBDM in pediatric cohorts without providing new clinical results. Thorough benchmarking, however, is required to understand future clinical results and allow them to guide clinical practice. Having achieved this proof of principle, future studies will be able to apply these IBDM methods to analyze the dose response for some of the most common side effects of radiotherapy treatments among childhood cancer survivors with various central nervous system disorders.
Because childhood cancer is rare, large cohorts such as those used in previous adult IBDM studies may not be achievable. Our study shows that the smaller cohort size typical of pediatric cancers (i.e., often at best only a few hundred patients) provides a large enough sample size to successfully implement the IBDM methods for incidence rates representative of common effects of interest among childhood cancer survivors (i.e., 20% and 10%). Our findings suggest that cohorts of similar size and power are feasible for successful implementation of IBDM and pave the way for future studies to establish this. The specific sample size required will depend on several factors, such as the relative incidence rate of the effect of interest among the cohort, the success of the spatial normalization and the dynamic range represented in the dose distributions.
Our findings in modified clinical dose distributions highlight a challenge common to all data-mining endeavors in that the results rarely reveal the desired information cleanly. Instead, data mining results require expert interpretation to realize the valuable information. Our data were derived from children who were treated in one of two clinical protocols, with approximately half of the children in each protocol. The children treated in one protocol tended to have slightly higher target doses delivered by three treatment beams – two lateral beams and one apex beam. In contrast, the children treated in the other protocol tended to have relatively lower target doses delivered by two lateral treatment beams alone. The higher target doses and apex beam were also associated with higher brainstem doses; therefore, the grouping for a 50% relative frequency of E labels generally divided the cohort of children according to their treatment protocol. This understanding helps explain the lower PPV we observed in the 50% relative frequency test (Table 4). Although we expected significant results outside of the brainstem in the modified clinical dose distribution experiment, which involved the spatially correlated therapeutic doses, the spatial distribution of these significant results for the 50% relative frequency test (Fig. 9) clearly demonstrated that the additional significant voxels outside of the brainstem were largely located in the target volume and along the beam paths (i.e., where the protocol-related systematic dose differences existed). The 20% and 10% relative-frequency groupings, however, more evenly mixed the data obtained from the children in the two protocols in the NE group. Therefore, these analyses were less susceptible to the protocol-related covariates. This highlights the critical importance of having a deep understanding of the dataset under consideration. Relatedly, Fig. 9 also reveals a trend of improving DSC with increasing significance level. Therefore, another potential avenue of excluding erroneously identified volumes is by assessing significance at higher levels. A practical limitation, however, is that evaluating higher significance requires a large number of permutations, and therefore prohibitively larger computation time and data-storage requirements. Future work could explore leveraging methods to extrapolate to smaller p-values without the need for additional permutations [
• Knijnenburg T.A.
• Wessels L.F.A.
• Reinders M.J.T.
• Shmulevich I.
Fewer permutations, more accurate P-values.
].
A further consideration is that the target volume for all children in this cohort was located in the same, well-defined region of the brain. Hence, the high-dose volume occurred in the same location for all patients after spatial normalization. This contrasts with other tumor sites in which IBDM has been previously applied, where tumor location can be highly varied within the anatomy of interest (e.g., lung) and spatial correlations between the dose distribution and heterogeneous anatomic locations may impact IBDM analyses [
• Palma G.
• Monti S.
• Pacelli R.
• Liao Z.
• Deasy J.O.
• Mohan R.
• et al.
Radiation Pneumonitis in Thoracic Cancer Patients: Multi-Center Voxel-Based Analysis.
,
• Cella L.
• Monti S.
• Xu T.
• Liuzzi R.
• Stanzione A.
• Durante M.
• et al.
Probing thoracic dose patterns associated to pericardial effusion and mortality in patients treated with photons and protons for locally advanced non-small-cell lung cancer.
]. Further work to assess the effect of tumor location consistency on IBDM success is required.
This raises an important consideration in choosing a dataset that is well suited to IBDM analyses: possible covariates and confounders. This consideration will guide the decision of whether the effects of confounders on the results can be interpreted and taken into account manually (as was done here) or explicitly considered with methods like per-voxel cox regression [
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
] or whether a different dataset containing confounders that are only weakly associated with the effect of interest should be sought. Other potential considerations in determining the suitability of a cohort include the presence and effects of healthy tissue deformations (e.g., from tumor growth or surgery) on DIR accuracy.
Finally, it should also be noted that IBDM results indicate correlation and should not be confused with causation. The resulting iso–t-maps provide an assessment of differences between dose distributions that are correlated with differences in effect status. Whether the indicated correlation occurs as a direct association between the dose and the effect of interest or is due to a confounding factor must be interpreted carefully. Even if the results suggest that the correlation is a direct association between the dose and the effect of interest, further studies other than IBDM are required to establish a causal relation between the absorbed dose and the biologic effect.

## 5. Conclusion

This study is the first to comprehensively test the applicability, strengths, and limitations of voxel-based analysis methods, i.e., IBDM, with pediatric data. Our findings demonstrated that such voxel-based analysis methods apply to data from pediatric cohorts, despite large age-related anatomic variations. The results of this study can guide future studies leveraging IBDM to improve understanding of radiation dose–response relationships in children by guiding their selection of an ideal cohort. Looking forward, such studies have the potential to improve radiotherapy by informing treatment planning to reduce treatment-related toxicities and improve quality of life among the survivors of childhood cancer.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

### Acknowledgements

The authors acknowledge the support of Nisha Badders, PhD, in editing this manuscript.
Funding: This publication is based on research supported by the American Society for Radiation Oncology (ASTRO) and the American Association of Physicists in Medicine (AAPM). Lydia J Wilson, Austin M Faught, Thomas E Merchant, and Yimei Li were funded by the American Lebanese Syrian Associated Charities. Marcel van Herk and Eliana Vasquez Osorio were supported by NIHR Manchester Biomedical Research Centre. Andrew Green was supported by Cancer Research UK via funding to the Manchester Cancer Research Centre [C147/A25254]. Abigail Bryce-Atkinson was sponsored by a SU2C-CRUK Pediatric Cancer New Discoveries Challenge Team Grant (Grant Number: SU2C#RT6186). Marianne Aznar was supported by Research England under the Engineering and Physical Sciences Research Council fellowship funding stream (EP/T028017/1).

### Data statement

The data that support the findings of this study are available from the corresponding author, LJW, upon reasonable request

## Supplementary data

• Appendix A
• Appendix B

## References

• Noone A.
• Krapcho M.
• Miller D.
• Brest A.
• Yu M.
• et al.
SEER Cancer Statistics Review, 1975–2016.
National Cancer Institute, Bethesda, MD2018
• Hudson M.S.
Jude LIFE Study Population.
St. Jude Children's Research Hospital, Department of Epidemiology2019
• Suh E.
• Stratton K.L.
• Leisenring W.M.
• Nathan P.C.
• Ford J.S.
• Freyer D.R.
• et al.
Late mortality and chronic health conditions in long-term survivors of early-adolescent and young adult cancers: a retrospective cohort analysis from the Childhood Cancer Survivor Study.
Lancet Oncol. 2020; 21: 421-435
• Wilson L.J.
• Newhauser W.D.
Generalized approach for radiotherapy treatment planning by optimizing projected health outcome: preliminary results for prostate radiotherapy patients.
Phys Med Biol. 2021; 66
• Langendijk J.A.
• Lambin P.
• De Ruysscher D.
• Widder J.
• Bos M.
• Verheij M.
Selection of patients for radiotherapy with protons aiming at reduction of side effects: The model-based approach.
• Rechner L.A.
• Eley J.G.
• Howell R.M.
• Zhang R.
• Mirkovic D.
• Newhauser W.D.
Risk-optimized proton therapy to minimize radiogenic second cancers.
Phys Med Biol. 2015; 60: 3999-4013
• Wu Q.
• Mohan R.
Algorithms and functionality of an intensity modulated radiotherapy optimization system.
Med Phys. 2000; 27: 701-711
• ASTRO
Quantitative analyses of normal tissue effects in the clinic (QUANTEC).
American Society for Radiation Oncology, New York, NY2010
• Siva S.
• MacManus M.P.
• Martin R.F.
• Martin O.A.
Cancer Lett. 2015; 356: 82-90
• Schwartz D.L.
• Hutcheson K.
• Barringer D.
• Tucker S.L.
• Kies M.
• Holsinger F.C.
• et al.
Candidate Dosimetric Predictors of Long-Term Swallowing Dysfunction After Oropharyngeal Intensity-Modulated Radiotherapy.
Int J Radiat Oncol. 2010; 78: 1356-1365
• Toussaint L.
• Indelicato D.J.
• Stokkevåg C.H.
• Pedro C.
• Mikkelsen R.
• et al.
Radiation doses to brain substructures associated with cognition in radiotherapy of pediatric brain tumors.
Acta Oncol. 2019; 58: 1457-1462
• Chiavassa S.
• Bessieres I.
• Edouard M.
• Mathot M.
• Moignier A.
Complexity metrics for IMRT and VMAT plans: a review of current literature and applications.
Brit J Radiol. 2019; 92: 20190270
• Palma G.
• Monti S.
• Cella L.
Voxel-based analysis in radiation oncology: A methodological cookbook.
Physica Med. 2020; 69: 192-204
• Chen C.
• Witte M.
• Heemsbergen W.
• van Herk M.
Multiple comparisons permutation test for image based data mining in radiotherapy.
• Acosta O.
• Drean G.
• Ospina J.D.
• Simon A.
• Haigron P.
• Lafond C.
• et al.
Voxel-based population analysis for correlating local dose and rectal toxicity in prostate cancer radiotherapy.
Phys Med Biol. 2013; 58: 2581-2595
• McWilliam A.
• Kennedy J.
• Hodgson C.
• Vasquez Osorio E.
• Faivre-Finn C.
• van Herk M.
Radiation dose to heart base linked with poorer survival in lung cancer patients.
Eur J Cancer. 2017; 85: 106-113
• Palma G.
• Monti S.
• D'Avino V.
• Conson M.
• Liuzzi R.
• Pressello M.C.
• et al.
A Voxel-Based Approach to Explore Local Dose Differences Associated With Radiation-Induced Lung Damage.
Int J Radiat Oncol Biol Phys. 2016; 96: 127-133
1. Witte MG, Heemsbergen WD, Bohoslavsky R, Pos FJ, Al-Mamgani A, Lebesque JV, et al. Relating Dose Outside the Prostate With Freedom From Failure in the Dutch Trial 68 Gy vs. 78 Gy. Int J Radiat Oncol. 2010;77:131-8.

• Beasley W.
• Thor M.
• McWilliam A.
• Green A.
• Mackay R.
• Slevin N.
• et al.
Image-based Data Mining to Probe Dosimetric Correlates of Radiation-induced Trismus.
Int J Radiat Oncol Biol Phys. 2018; 102: 1330-1338
• Noone A.
• Krapcho M.
• Miller D.
• Bishop K.
• Kosary C.
• et al.
SEER Cancer Statistics Review, 1975–2014.
National Cancer Institute, Bethesda, MD2017
• Mulrooney D.A.
• Armstrong G.T.
• Huang S.
• Ness K.K.
• Ehrhardt M.J.
• Joshi V.M.
• et al.
Cardiac Outcomes in Adult Survivors of Childhood Cancer Exposed to Cardiotoxic Therapy: A Cross-sectional Study.
Ann Intern Med. 2016; 164: 93-101
• Facoetti A.
• Barcellini A.
• Valvo F.
• Pullia M.
The Role of Particle Therapy in the Risk of Radio-induced Second Tumors: A Review of the Literature.
Anticancer Res. 2019; 39: 4613-4617
• Darby S.C.
• Ewertz M.
• McGale P.
• Bennet A.M.
• Blom-Goldman U.
• Brønnum D.
• et al.
Risk of Ischemic Heart Disease in Women after Radiotherapy for Breast Cancer.
N Engl J Med. 2013; 368: 987-998
2. Bates JE, Shrestha S, Liu Q, Mulrooney DA, Smith SA, Leisenring WM, et al. Low-dose radiation to cardiac substructures and late-onset cardiac disease: A report from the Childhood Cancer Survivor Study (CCSS). J Clin Oncol. 2021;39:10027.

• Diallo I.
• Samand A.
• Quiniou E.
• Chavaudra J.
• et al.
Frequency Distribution of Second Solid Cancer Locations in Relation to the Irradiated Volume among 115 Patients Treated for Childhood Cancer.
Int J Radiat Oncol. 2009; 74: 876-883
• Palma G.
• Monti S.
• Pacelli R.
• Liao Z.
• Deasy J.O.
• Mohan R.
• et al.
Radiation Pneumonitis in Thoracic Cancer Patients: Multi-Center Voxel-Based Analysis.
Cancers. 2021; 13: 3553
• Veiga C.
• Lim P.
• Anaya V.M.
• Chandy E.
• D’Souza D.
• et al.
Atlas construction and spatial normalisation to facilitate radiation-induced late effects research in childhood cancer.
Phys Med Biol. 2021; 66: 105005
• Monti S.
• Paganelli C.
• Buizza G.
• Preda L.
• Valvo F.
• Baroni G.
• et al.
A novel framework for spatial normalization of dose distributions in voxel-based analyses of brain irradiation outcomes.
Phys Med. 2020; 69: 164-169
• Skaarup M.
• Lundemann M.J.
• Darkner S.
• Jørgensen M.
• Marner L.
• Mirkovic D.
• et al.
A framework for voxel-based assessment of biological effect after proton radiotherapy in pediatric brain cancer patients using multi-modal imaging.
Med Phys. 2021; 48: 4110-4121
• Green A.
• Vasquez Osorio E.
• Aznar M.C.
• McWilliam A.
• van Herk M.
Image Based Data Mining Using Per-voxel Cox Regression.
Front Oncol. 2020; 10
• McWilliam A.
• Dootson C.
• Graham L.
• Banfill K.
• Abravan A.
• van Herk M.
Dose surface maps of the heart can identify regions associated with worse survival for lung cancer patients treated with radiotherapy.
Phys Imag Radiat Oncol. 2020; 15: 46-51
• Modat M.
• Ridgway G.R.
• Taylor Z.A.
• Lehmann M.
• Barnes J.
• Hawkes D.J.
• et al.
Fast free-form deformation using graphics processing units.
Comput Methods Programs Biomed. 2010; 98: 278-284
• Camargo A.
• Azuaje F.
• Wang H.
• Zheng H.
Permutation - based statistical tests for multiple hypotheses.
Source Code Biol Med. 2008; 3: 15
• Dudoit S.
• Shaffer J.P.
• Boldrick J.C.
Multiple Hypothesis Testing in Microarray Experiments.
Statist Sci. 2003; 18: 71-103
• Belmonte M.
• Yurgelun-Todd D.
Permutation testing made practical for functional magnetic resonance image analysis.
IEEE Trans Med Imaging. 2001; 20: 243-248
• Groppe D.M.
• Urbach T.P.
• Kutas M.
Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review.
Psychophysiology. 2011; 48: 1711-1725
• Kim H.S.
• Park S.B.
• Lo S.S.
• Monroe J.I.
• Sohn J.W.
Bidirectional local distance measure for comparing segmentations.
Med Phys. 2012; 39: 6779-6790
• Deasy J.O.
• Blanco A.I.
• Clark V.H.
CERR: A computational environment for radiotherapy research.
Med Phys. 2003; 30: 979-985
• Newhauser W.D.
• Schneider C.
• Wilson L.
• Shrestha S.
• Donahue W.
A review of analytical models of stray radiation exposures from photon- and proton-beam radiotherapies.
Radiat Prot Dosim. 2018; 180: 245-251
• Wilson L.J.
• Pirlepesov F.
• Moskvin V.
• Li Z.
• Guo Y.
• Li Y.
• et al.
Proton therapy delivery method affects dose-averaged linear energy transfer in patients.
Phys Med Biol. 2021; 66: 074003
• Wilson L.J.
• Newhauser W.D.
• Schneider C.W.
• Kamp F.
• Reiner M.
• Martins J.C.
• et al.
Method to quickly and accurately calculate absorbed dose from therapeutic and stray photon exposures throughout the entire body in individual patients.
Med Phys. 2020; 47: 2254-2266
• Hauri P.
• Hälg R.A.
• Besserer J.
• Schneider U.
A general model for stray dose calculation of static and intensity-modulated photon radiation.
Med Phys. 2016; 43: 1955-1968
• Moyers M.F.
• Toth T.L.
• Chvetsov A.V.
• Unkelbach J.
• Mohan R.
• et al.
202 - Physical Uncertainties in the Planning and Delivery of Light Ion Beam Treatments.
Am Assoc Physicists Med. 2020;
• Friedman D.L.
• Whitton J.
• Leisenring W.
• Mertens A.C.
• Hammond S.
• Stovall M.
• et al.
Subsequent Neoplasms in 5-Year Survivors of Childhood Cancer: The Childhood Cancer Survivor Study.
JNCI. 2010; 102: 1083-1095
• Kralik S.F.
• Watson G.A.
• Shih C.S.
• Ho C.Y.
• Finke W.
• Buchsbaum J.
Radiation-Induced Large Vessel Cerebral Vasculopathy in Pediatric Patients With Brain Tumors Treated With Proton Radiation Therapy.
Int J Radiat Oncol Biol Phys. 2017; 99: 817-824
• Lucas J.
• LeVine D.
• Ismael Y.
• Hsu C.-Y.
• Darrow K.
• Faught A.
• et al.
Rare-12. Vasculopathy in pediatric craniopharyngioma patients treated with surgery and radiotherapy.
Neuro-Oncol. 2020; 22 (iii444-iii)
• Anguita D.
• Ghio A.
• Ridella S.
• Sterpi D.
K-fold cross validation error rate estimate in support vector machines.
Proc Int Conf Data Mining. 2009;
• Dice L.R.
Measures of the Amount of Ecologic Association Between Species.
Ecology. 1945; 26: 297-302
• Acharya S.
• Guo Y.
• Patni T.
• Li Y.
• Wang C.
• Gargone M.
• et al.
Association Between Brain Substructure Dose and Cognitive Outcomes in Children With Medulloblastoma Treated on SJMB03: A Step Toward Substructure-Informed Planning.
J Clin Oncol. 2022; 40: 83-95
• Wright J.L.
• Yom S.S.
• Awan M.J.
• Dawes S.
• Fischer-Valuck B.
• Kudner R.
• et al.
Standardizing Normal Tissue Contouring for Radiation Therapy Treatment Planning: An ASTRO Consensus Paper.
Pract Radiat Oncol. 2019; 9: 65-72
• Knijnenburg T.A.
• Wessels L.F.A.
• Reinders M.J.T.
• Shmulevich I.
Fewer permutations, more accurate P-values.
Bioinformatics. 2009; 25: i161-i168
• Cella L.
• Monti S.
• Xu T.
• Liuzzi R.
• Stanzione A.
• Durante M.
• et al.
Probing thoracic dose patterns associated to pericardial effusion and mortality in patients treated with photons and protons for locally advanced non-small-cell lung cancer.