Evaluation of a multi-atlas CT synthesis approach for MRI-only radiotherapy treatment planning

Highlights • Establishing MRI-only RTP workflows requires synthetic CTs for dose calculation.• This study evaluates the feasibility of using a multi-atlas CT synthesis approach.• The proposed method was validated on head and neck and prostate cancer patients.• Results showed an accurate bone estimation for future patient positioning.• Results showed that synthetic CTs are suitable to perform clinical dose calculations.


Introduction
Cancer treatment with radiotherapy requires information regarding the patient's anatomy, such as the organs and tumour's location and the tissue attenuation properties necessary for dose calculations. X-ray computed tomography (CT) is the current gold standard for radiotherapy treatment planning (RTP) mainly because CT intensity values expressed in Hounsfield units (HU) can easily be correlated with tissue electron densities. However, because of its limited soft-tissue contrast, CT imaging can prevent precise and reliable tumour location, particularly in regions such as the brain, head and neck (H&N) or prostate. To overcome this limitation, magnetic resonance imaging (MRI) is being integrated into the radiotherapy workflow. By virtue of their excellent soft-tissue contrast, MR images improve the target volume definition [1,2]. Avoiding radiation during the imaging protocol is also a major advantage.
The acquisition of both CT and MR images of the patient is already part of the clinical workflow for some indications. MR data is used to define the target volume (i.e. the tumour) and CT data to plan the treatment. Image registration is used to define a spatial relationship between the two images allowing any manual contouring from the MRI to be mapped to the planning CT. However with this approach, the workflow is extremely dependent on the quality of the image registration [3,4]. The increased cost and workload for clinicians, when using two different image modalities, is also undesirable [5].
Due to these limitations, there is a growing interest in using an MRI-only RTP workflow. However, as no fundamental relationship between MR image intensities and electron density values exists [6], an accurate method to derive CT equivalent information from MR data (referred to as synthetic CT) is required to perform dose calculations. To assess the feasibility of MR-based treatment planning, the first experiments consisted of assigning single bulk densities to tissue classes (such as bone, air and soft-tissue) delineated either from a CT image [7][8][9] or manually from an MR image [10], and then comparing the resulting synthetic CT-based plan to the original CT-based plan. For both H&N and prostate target volumes, dosimetric errors were reported to be 1-2% different from the CTbased dose calculation [8][9][10]. Korhonen et al. [11] then explored the possibility of assigning subject-specific density values to the bone class by manually segmenting an MR image and converting the MRI intensity values to HUs using a second-order polynomial model. They showed that this technique improved the plan accuracy when compared with single bulk density assignment. Although these studies showed promising results, their use is limited by the manual delineation step, making them non-viable in an online workflow. Automatic delineation is challenging as bone is not easily distinguishable in traditional MR sequences, due to bone's short T2 ⁄ relaxation time. Despite these challenges, bulk density assignment approaches have recently been made available in clinical RTP software platforms, such as the MRCAT [12] by Philips (Philips, Best, Netherlands), and are already used in practice for cone beam CT-based dose calculations [13] and to account for tissue heterogeneities (i.e. presence of metal implants).
Other methods exist to obtain synthetic CT (sCT) images automatically from MR images and many have been applied to RTP. Hsu et al. [14] used a fuzzy c-means algorithm to segment a set of structural MR images into five tissues classes. The sCT was generated by assigning relative attenuation coefficients with weights based on the probability that each class exists at a given location. Jonsson et al. [15] applied the method developed by Johansson et al. [16] where a sCT was obtained from a Gaussian mixture regression model linking the MRI intensity values to the CT HUs. Another family of methods, the atlas-based methods, rely on a single template [17] or a database of MR and CT image pairs [18][19][20][21][22][23]. First, a non-rigid registration between the atlas and test subject MR images is performed. Then, the same transformation is applied to the associated CT images and finally, for the multi-atlas methods, the registered CT images are fused to generate the final sCT. The fusion can be obtained by computing the voxelwise median [21], using a probabilistic Bayesian framework [22], an arithmetic mean process or pattern recognition with Gaussian process [23] or a local image similarity measure [18,19]. Instead of using a database of images, Andreasen et al. [24] employed a dictionary of MR and CT patches. The sCT was predicted by extracting patches from the test subject MRI, running an intensity-based nearest neighbour search in the patch database and fusing the selected CT patches using a similarity-weighted average. Combining segmentation and use of a template database, Siversson et al. [25] proposed a statistical decomposition algorithm to automatically generate sCTs.
The multi-atlas CT synthesis approach evaluated in this work was first developed for brain applications [26,27] and then extended to H&N cancer [18]. In this paper, we present a thorough validation of the method, not only for H&N but also for patients with prostate cancer. The main difference with most of the other multi-atlas methods [21][22][23] is that the fusion of the atlases is based on the local similarity between each atlas and the test subject. The difference with Dowling et al. [19] is that the proposed approach guarantees a good initial alignment between atlas and test subjects due to a robust affine inter-subject registration process, allows the synthesis from multiple MR sequences and refines the synthesis via an iterative process.
In this paper, we assess the feasibility of implementing our multi-atlas approach into clinical MRI-based RTP on both H&N and prostate cancer patients. We evaluate its performance, both in terms of geometric and dosimetric accuracy, against the standard planning done on a planning CT. To set the results in perspective, we also compare its performance against a synthetic sCT obtained via manual bulk density assignment. To our knowledge, this is the first time that a multi-atlas approach has been applied and evaluated for multiple regions, both H&N and prostate sites, in the context of RTP.

Data acquisition
Retrospective data from six H&N patients (with oropharyngeal cancer) treated with volumetric arc therapy (VMAT) and fifteen prostate patients treated with fixed-field intensity-modulated therapy (IMRT) were included in this study. Each patient had a planning CT scan (Philips Big Bore CT), a T1-and T2-weighted turbo spin echo MRI (Siemens 1.5T MRI), a CT delineated structure set and a clinically approved treatment plan (Pinnacle 3 , Philips Medical Systems) to a total dose of 65 Gy and 67-74 Gy for H&N and prostate patients, respectively. All patients were imaged on the same day and in the same position -head-first supine -for both MR and CT image sessions. For H&N patients, the same fixation device was used while, for prostate patients, a different couch was used for MR (curved couch) and CT (flat couch) imaging sessions. For all H&N patients, the resolution of both T1-and T2weighted MR scans was 0.104 Â 0.104 Â 0.2 cm 3 . For all prostate patients the resolution of T1-and T2-weighted MR scans was 0.164 Â 0.164 Â 0.5 cm 3 and 0.146 Â 0.146 Â 0.5 cm 3 , respectively. For H&N patients, the resolution of the planning CT was 0.117 Â 0.117 Â 0.2 cm 3 while, for prostate patients, it was 0.098 Â 0.098 Â 0.2 cm 3 . All patients included in this study had given consent for their data to be used for research purposes.
Because of the retrospective nature of the study, inconsistencies exist between the different imaging modalities acquired. A large field-of-view (FOV) was available for the CT scans (scanning level for H&N patients extends from the top of the head to the apex of the lungs and for prostate patients from the abdomen to the lower limbs). In contrast, the MR scans for both H&N and prostate sites where reduced in the cranio-caudal direction, only encompassing the region of interest including the planning treatment volume (PTV) (Figs. 1 and 2). In addition, for the H&N patients, the patient external outline was not fully covered in the MR images, which resulted in missing tissue at the back of the head and on the chin (Fig. 1). Note that this concerns less than 10% of the volume within the MRI FOV.

sCT generation
Two different schemes for the sCT construction were used: the proposed multi-atlas method (sCT a ) and the manual bulk density assignment (sCT bda ).

Multi-atlas CT synthesis
The approach for the generation of the sCT a has been described in detail in previous publications [18,26,27]. Briefly, the proposed method relies on pre-acquired pairs of non-rigidly registered T2and/or T1-weighted MR and planning CT images. The non-rigid alignment was necessary to compensate for position differences (and the use of different couches for the prostate patients) between the MRI and CT acquisitions. For the H&N patients, the atlas database was composed of seventeen pairs of T2-weighted MR and CT images, as described in [18], and the method was validated using the images of six other H&N patients not included in the database. Regarding the prostate patients, the atlas database was composed of both T1-and T2-weighted MR, and CT images of fifteen patients. The method was validated following a leave-one-out approach.
To generate the sCT a , the first step was to register all the MRIs in the database to the test subject's MRI. A robust affine registration [18] was used followed by a non-rigid cubic B-Spline registration using normalized mutual information as similarity measure, as implemented in NiftyReg 2 . The robust affine step guarantees that each atlas MR image is well aligned with the test subject despite the large differences in the FOV size and location that were observed between the subjects for both anatomical sites. The transformations were then applied to map the atlas CTs to the test subject MRI. The sCT a was finally obtained by fusing the mapped atlases according to their local similarity to the test subject using a spatial-varying weighted averaging [18].
An iterative process was then used to improve the synthesis [18]. First, the initial sCT a obtained as described above was combined with the test subject MR image(s). Then, all the CT-MR image sets in the atlas database were aligned to the sCT a -MR image set using a multi-channel non-rigid registration. The refined sCT a was finally obtained by fusing the registered atlas according to a similarity measure computed between the sCT a -MR and mapped CT-MR sets. Combining multiple modalities (MRI and CT) at both the registration and image similarity stages is expected to provide more realistic mappings and improve the local selection of atlases, especially in low contrast areas.  As the final step, the sCT a was aligned and resampled to the original planning CT space using the inverse non-rigid transformation mapping the test subject's CT and MR images, as the registration algorithm chosen is symmetric. This step was necessary to reduce the influence of the different acquisition positions while comparing CT and sCTs [25]. Examples of MR, planning CT and sCT a images are displayed in Figs. 3 and 4.

Manual bulk density assignment
To generate the sCT bda for each patient, the MR scans were nonrigidly registered to the planning CT applying the same transformations used to align the CT and MR images for the sCT a generation. The delineation of the different tissue classes (bone and air), followed by the assignment of specific physical density values to each class, was then carried out using the deformable T2weighted MR image sets. The rest of the body was defined to be of water equivalent density. For prostate patients, bone (1.22 g/ cm 3 ) and for H&N patients, bone (1.53 g/cm 3 ) and air (0.001 g/ cm 3 ) tissue classes were defined. Physical densities were defined according to the literature [8,9,28]. Bone delineation was performed manually as no efficient threshold exists for bone segmentation using traditional MR sequences. All delineations were done by the same person for consistency and were checked by an experienced physician for adequateness. Air delineation for H&N patients was done using a threshold-based delineation available within the RayStation (Raysearch Laboratories, Stockholm) treatment planning system (TPS). Low MRI intensity values were chosen (<8) and deviations were manually corrected. sCT bda images (Fig. 5) were constructed with the resolution of the original CT image.

Evaluation
The first stage of the evaluation consisted of assessing the accuracy of the generated sCT a and sCT bda . Then, the performance of all sCTs against the planning CT was evaluated in terms of geometric  and dosimetric accuracy. To reduce the effect of the image discrepancies detailed in Section 2.1, the performance of all sCTs was only evaluated within the FOV where MRI information was available.

Synthetic CT accuracy evaluation
To assess the accuracy of the automatically generated sCTs, the mean absolute error (MAE), defined as was computed for each subject between the sCTs and the planning CT in the external contour, in the bone region and in the soft-tissue region within the MRI FOV. N is the number of voxels x in the considered region. Similarly to Siversson et al. [25], the bone region was defined by thresholding the planning CT at 150 HU within the MRI FOV, and using morphological operators to include softer bone and bone marrow. The soft-tissue region was defined by thresholding the planning CT at À150 HU within the MRI FOV and subtracting the bone region.

Geometric evaluation
The geometric evaluation was performed using the clinical CT contours and delineations on the T2-weigthed MRI (sCT bda ) and on the atlas-based sCT (sCT a ). Both external and bone contours within the MRI FOV were evaluated. The external contour was delineated in all images using an automatic threshold tool in RayStation. For the MR images, bone contours were delineated manually while for the CT and sCT a images, the delineations were performed as mentioned before in Section 2.3.1.
The contours were first individually evaluated in terms of shape, position and volume. The contours' shape and position were visually inspected by overlaying the sCTs' segmented contours on the CT contours. Changes in volume were evaluated using a volume index (VI) [29]: where V(A) is the volume of the clinical CT contour and V(B) the volume of the evaluated contour. VI = 1 indicates identical volumes, while VI > 1 indicates a higher clinical than evaluated contour volume and VI < 1 vice versa. Finally, an overall evaluation of the contours was performed using the dice similarity coefficient (DSC): A DSC > 0.7 was considered a good overlap [30].

Dosimetric evaluation
The dosimetric analysis consisted of both gamma (Ɣ) and dosevolume histogram (DVH) analyses. To standardize comparisons between CT and sCTs' dose distributions, and to check for potential variability in structure definition, the dosimetric accuracy of the generated sCTs was validated using the clinical CT contours. Both organs at risk (OARs) and the target, delineated by a clinician, were rigidly copied from the planning CT to each set of sCTs. Due to the reduced MRI FOV, to simulate the whole body of the patient, the external contour delineated on the planning CT was copied to each sCT and altered in the MRI FOV to be able to maintain the original body outline defined on each sCT. To make a consistent evaluation of the dose distribution differences between image sets, all the regions within the body contour but outside the MRI FOV were assigned to be of water equivalent density in both CT and sCT images ( Figs. 1 and 2). For the H&N patients, to maintain the use of the original external contours despite the lack of MRI coverage, the missing tissue in the back of the head and on the chin (Fig. 1) was assumed to be of air equivalent density. For each patient, the original clinical plan was re-calculated using the original planning parameters on the new density override CT and sCTs geometries. Both sCT bda and sCT a were evaluated. All dose calculations were performed using the RayStation TPS with a dose grid of 0.25 Â 0.25 Â 0.25 cm 3 . Furthermore, in this study the dosimetric influence of the patient couch was of no concern as couch density was set to air in the density override planning CT for the dose recalculations and it was not present in the sCTs' image sets.
For the Ɣ-evaluation, a local 3D algorithm implemented in Plastimatch 3 , with constraints of 3% dose difference (DD) and 3 mm distance to agreement (DTA), and 2% DD and 2 mm DTA, using the density override CT dose distribution as reference, was applied. The information from the Ɣ-maps was summarised by calculating the percentage of passing points within the MRI FOV (Ɣ 1).
DVH metrics including the percentage point difference (PPD) were evaluated for the clinical PTV and OARs cropped within the MRI FOV. The PPD was calculated using the dose value for a specific DVH point in the density override CT dose distribution as the ground truth and the same point in the sCT dose distribution as evaluation. For the target volume D 98% , D mean and D 2% were calculated where D x is the dose given to x% of the structure volume and Dmean is the mean dose given to the evaluated volume. D 98% and D 2% were used to evaluate the minimum and maximum dose given to the structure, respectively. For the OARs, only D mean and D 2% were determined. Spinal cord and right and left parotids for the 3 http://plastimatch.org/plastimatch.html. H&N patients, and rectum and bladder for the prostate patients were evaluated.

Synthetic CT accuracy evaluation
The average and standard deviation of the MAE obtained between the sCTs and planning CT images are presented in Table 1. We note that the synthesis error is higher for the H&N patients than for the prostate patients. Fig. 6 displays overlays of the external contours for two H&N patients representing the best (Fig. 6 (a)) and worst-case scenario ( Fig. 6 (b)).

External contours
For the H&N cases, despite the best efforts (same patient positioning and immobilization) small discrepancies in patient positioning and rotation between the CT and MR acquisitions were unavoidable and, for a small number of patients (n = 2), a clear difference in the contours was visible ( Fig. 6(b)). As a result, these dissimilarities will introduce dosimetric challenges. For the prostate patients, after performing the non-rigid transformation for positioning correction between the CT and MR images, no systematic differences between contours were seen.
In general a good qualitative agreement was observed for the external contour between the images. However, the sCT bda -based delineation was systematically a few voxels smaller than the planning CT contour. In Fig. 7, a two-step drop of intensity over a few voxels can be seen in the MR images until the intensity of air outside the patient is reached while for CT images a clear drop is seen. These differences created a systematic difference in the external contours for all patients. When defining the external contour on sCT a images, a higher degree of similarity with the CT contour was observed.
The VI results for the external contours are displayed in Table 2. We can see that the external contour volume is underestimated (VI > 1) for both sCT bda -and sCT a -based delineations due to the blurry MR boundaries (Fig. 7). Underestimation of the external contour volume is higher for the H&N patients due to the lack of MRI coverage (Fig. 1). However, volumes on sCT a agreed more closely to the original CT volumes.
The DSC values are displayed in Table 3. A high overall similarity with the original contours was achieved for the external contours for all images as the DSC values were larger than 0.7. Fig. 8 represents the bone contours for a representative H&N and prostate patient. Deviations in shape were observed between the original CT and sCT bda bone contours ( Fig. 8 (a)) as a result of the MR-manual delineation and poor bone visibility on the T2weighted MR sequence. A higher degree of shape similarity was achieved for the sCT a images.  The VI results for the bone contours are displayed in Table 2. A clear trend can be seen for both groups of patients. The sCT bdabased contours were smaller (VI > 1) and the sCT a -based contours were larger (VI < 1) than the CT-based contours. These differences result from the poor bone visibility on MR and from the blurriness introduced by the atlas method.

Bone contours
The DSC values for the bone contours are displayed in Table 3. As for the external contours, a high overall similarity with the original contours was achieved (DSC > 0.7). However, the high DSC values seen for the multi-atlas approach indicate a closer overall agreement between CT and sCT a -based delineations.

Dosimetric evaluation
Figs. 9 and 10 display gamma maps (2%_2 mm) for representative H&N and prostate patients, respectively. The percentage of passing points for each sCT is detailed in Table 4.
All sCTs displayed a high number of points failing the gamma criteria close to the skin, due to the external contour differences. When using sCT a , a greater similarity with the CT dose distribution was observed: a greater number of small gamma values (0-0.3) and a greater number of passing points were obtained when compared to the sCT bda approach.
The PPD between the DVH points from the CT and the generated sCTs are displayed in Tables 5 and 6 for H&N and prostate patients, respectively. Considering all the patients, the mean PPD for the PTV coverage using all sCTs was less than ±0.7% for both the H&N and prostate patients, reaching a maximum individual difference of ±2% of the original dose value. For all evaluated DVH points, patient-specific results were variable.
For the H&N patients, the mean PPD for the OARs was less than ±0.5%, with the maximum individual difference equal to ±1.5%. For the prostate patients, the mean PPD for the OARs DVH points was less than ±0.9%, with the maximum individual difference equal to ±2.0%. For these patients and for the majority of the DVH points analyzed, sCT a tended to have the best agreement with the CT results. However, as verified for the PTV, patient-specific results vary and there is no obvious advantage of using a specific sCT method for dose calculations.

Discussion
To establish an MRI-only RTP workflow, ensuring accurate dose calculations and geometry delineation from the patient's MR images is of key importance. This work presents a feasibility study where clinical CT-based dose distributions were compared with  those obtained from sCT images generated by our proposed multiatlas CT synthesis method and by bulk density assignment. As a first step in the evaluation, we assessed the accuracy of the sCT a obtained with the proposed multi-atlas approach. The MAE obtained within the external contour for the prostate patients was on average 49.8 ± 4.6 HU, which is lower than the error obtained by Kim et al. [20] (74.3 ± 10.9 HU) and is of the same order as the MAE obtained by Siversson et al. [25], Dowling et al. [19] and Andreasen et al. [24] (36.5 ± 4.1 HU, 40.5 ± 8.2 HU and 54 ± 8 HU, respectively), when taking into account the fact that the images used in the present study had a lower resolution. The synthesis error was higher for the H&N patients as the neck is a more challenging area for registration algorithms because of the mixture of bone and air, and due to the presence of large-scale postural changes between patients, such as flexion or extension of the neck and the position of the jawbone.
Then, we carried out a geometric evaluation where the CT-based (used clinically), MR-based (used for sCT bda ) and  sCT a -based delineations were compared using the MAE, VI and DSC. The external and bone contours were very similar when delineated on either the original planning CT or sCT a . When comparing the sCT bda and CT bone contours, the synthesis error (MAE) was higher, and obvious deviations in shape and in volume were observed. These can be explained by the use of constant HUs for each tissue class to build the sCT bda , inter-observer variability, as the delineation was performed manually, and the poor bone visibility in conventional MR sequences. Currently, several groups are working with ultrashort echo time (UTE) sequences to obtain  a discriminant signal from bone [16,31]. In UTE imaging, as the signal is sampled during the free induction decay, before the signal from bone has vanished, it is possible to distinguish bone from air. However, for the MRI-only workflow the number of different MR sequences that can be obtained is limited due to time constraints. Thus, an additional UTE sequence for better bone definition might not always be available. Differences in the delineation of the body contour arose due to differences in the set-up between the CT and MR imaging. Despite acquiring data on the same day and using the same fixation devices, larger geometrical differences were found for the H&N patients. The potential impact of daily set-up variations between imaging sessions at these sites has already been evaluated in the literature [32,33]. The mean average set-up error in any single dimension is reported to be up to 4 mm. In addition, MRI usually does not express a clear boundary which hinders the external contour delineation. For future studies, it will be crucial to identify voxels at the boundary that lie outside the patient to omit further interference while defining the patient outline and while performing the required registration processes for the atlas approach.
The last step of the evaluation consisted of comparing the dose distributions obtained from the sCT bda and sCT a with the CT dose distributions. For both the target and the OARs, both sCT-based dose distributions differed from the corresponding CT-based dose distribution, on average, no more than 1% of the original dose. These results are comparable to those already presented in the literature [8,9,11,19,23,24]. Mean percentages of passing points within the external contour of 98-100% and 94-97% were achieved for both methods and cancer sites for the 3D local Ɣanalyses with constraints of 3%_3 mm and 2%_2 mm, respectively. Ɣ-pass rates of the same order were reported by Korhonen et al. [11] (93-97% for 1%_1 mm 2D Ɣ-test), Jonsson et al. [15] (99% for 3%_3 mm Ɣ-test), Dowling et al. [19] (93-96% for 2%_2 mm 3D global Ɣ-test), Uh et al. [23] (98-99% for 2%_2 mm Ɣ-test), Andreasen et al. [24] (97% for 1%_1 mm 2D global Ɣ-test) and Siversson et al. [25] (99-100% for 2%_1 mm local Ɣ-test). Furthermore, dose distributions based on sCT a showed a better PTV agreement and a more homogeneous gamma map with lower gamma values than sCT bda . As for the multi-atlas scheme a one-to-one estimation for each electron density voxel value is assigned, a greater similarity with the original dose distributions is expected. The magnitude of these dosimetric differences will also depend on the planning parameters (VMAT or multi-field plan), and on the geometry of the patient. In general, higher dosimetric differences were found for the H&N patients. These could be explained by the lack of MRI coverage and to the difficulties added by the large-scale postural changes between imaging sessions in the registration processes. Furthermore, these patients are more sensitive to dose errors as a mixture of bone, air and soft-tissue is present, while for prostate patients, the irradiated volume consists mostly of bone and soft-tissue.
The results of this feasibility study showed that both bulk density assignment and multi-atlas methods are suitable to perform dose calculations. Both approaches showed a good performance despite the limitations introduced by the suboptimal retrospective data: limited MRI FOV, the use of images from different scanners in the atlas and test population for the H&N patients and the presence of geometrical distortion within the MRI images.
As a result of the limited MRI FOV, large systematical differences within the beam path between the original CT and sCTs would be expected. To overcome this limitation, a density override approach assigning water equivalent density to all regions outside the MRI FOV but within the CT external contour was used for both CT and sCTs. This approach assures an evaluation as fair as possible, but in our opinion does not artificially improve the results as differences between CT and sCTs would only be related to electron density changes within the MRI FOV. In addition, for the H&N patients, the patient external outline was not fully covered in the MR images, which resulted in missing tissue at the back of the head and on the chin. These regions were assumed to be of air equivalent density. Filling these tissue gaps with water density could lead to results that could be better than in the true clinical situation whereas assuming an air density represents a 'worstcase scenario'. Nevertheless, there is only missing tissue in a small number of slices (<10% of the sCT external volume) resulting in a minimal impact on the geometrical evaluation and a reduced effect on the dosimetric evaluation. This problem should be easy to overcome in the future when radiologists are aware that MR scans will also be used for RTP. Imaging protocols should be adapted for the FOV to cover the entire body contour and not only the PTV region.
Building a reliable atlas database is a pre-requisite to guarantee the good performance of this atlas-based approach. CT and MR images need to be acquired for a number of patients on the same day, under treatment position and using the same fixation devices. Ideally, all data should be collected using the same MR sequences and scanner, as MR intensities are highly dependent on the equipment. However, establishing scanner-specific atlases would be challenging or even unpractical considering the clinical reality. By testing our approach using data from different scanners, as for the H&N patients, we demonstrated the robustness of our method to these differences. Nevertheless, using as atlases images of patients acquired with the same sequence and on the same scanner as the test patient would improve the results.
MRI is also known to suffer from geometric distortions owing to the non-linearity of the imaging gradients over large fields of view. Spatial distortions in MR images vary with field strength and with the image acquisition protocol, which explains the difficulty to provide a general estimation on their magnitude. The development of correction techniques is a very active field of research in the MR community, and we expect the impact of these distortions in the context of photon radiotherapy to become insignificant in the future. A remark to consider is that patient-specific distortions due to magnetic susceptibility or imaging artifact in the MRI present a limitation for the generation of sCTs. For CT images, artifacts can be manually delineated, overwritten with appropriated density values and in this way corrected. Thus, for the sCT bda approach they would not represent a restriction, but would compromise the performance of segmentation and density assignment. For an atlas-based approach, patient-specific abnormalities that are not represented in the atlas generation are a limitation and an exclusion criterion. However, this only concerns a limited number of patients.
Despite these limitations, a good dosimetric performance was achieved for both methods. However, the geometric evaluation urges caution. Bone and external delineations can only be performed automatically, and with a high degree of similarity with the planning CT, using the sCT a . For sCT bda , bone delineation has to be performed manually which is a very time consuming task, is subject to inter-observer variability, and is performed as a best guess, making this method unsuitable for clinical use. In contrast, the proposed atlas method automatically generates an sCT in around three hours without performance optimization. In the future, we suggest combining soft-tissue and target contours delineated directly on the MR image, with bone contours and HUs obtained from the proposed multi-atlas approach. Since the sCT a is created in the same space as the MRI, the definition of the soft-tissue structures and target on the MRI can be easily propagated to the sCT a for planning. As bony structures in the sCT a images were shown to be consistent with the original CT, this image could also be used for patient positioning, at least for H&N patients where positioning relies on accurate bone geometry.

Conclusions
In this paper we evaluate two methods for MR-based RTP: the proposed multi-atlas CT synthesis method (sCT a ) and a bulk density assignment method (sCT bda ). Plans re-calculated on both sCT a and sCT bda showed an overall good dosimetric agreement with the clinical CT plan. However, only sCT a can give an accurate bone delineation enabling patient positioning during treatment. Combining MR delineations with our multi-atlas scheme could improve the dosimetric and geometric accuracy of the treatment, and reduce the number of imaging procedures. Note that, due to the use of suboptimal-retrospective data, the results from this study should be interpreted as a conservative worst-case scenario.
Several methodological novelties were presented to guarantee the robustness of the proposed multi-atlas CT synthesis approach: (1) a robust affine was proposed to ensure the correct alignment between each atlas and the test patient; (2) the method was extended to handle multiple MR contrasts; (3) an iterative process was proposed to improve the synthesis accuracy in the bone region. To the best of our knowledge, this is the first time that a CT synthesis approach has been able to generate accurate sCT images for RT planning for both H&N and prostate patients.