Choreography Controlled (ChoCo) brain MRI artifact generation for labeled motion-corrupted datasets

MRI is a non-invasive medical imaging modality that is sensitive to patient motion, which constitutes a major limitation in most clinical applications. Solutions may arise from the reduction of acquisition times or from motion-correction techniques, either prospective or retrospective. Benchmarking the latter methods requires labeled motion-corrupted datasets, which are uncommon. Up to our best knowledge, no protocol for generating labeled datasets of MRI images corrupted by controlled motion has yet been proposed. Hence, we present a methodology allowing the acquisition of reproducible motion-corrupted MRI images as well as validation of the system’s performance by motion estimation through rigid-body volume registration of fast 3D echo-planar imaging (EPI) time series. A proof-of-concept is presented, to show how the protocol can be implemented to provide qualitative and quantitative results. An MRI-compatible video system displays a moving target that volunteers equipped with customized plastic glasses must follow to perform predefined head choreographies. Motion estimation using rigid-body EPI time series registration demonstrated that head position can be accurately determined (with an average standard deviation of about 0.39 degrees). A spatio-temporal upsampling and interpolation method to cope with fast motion is also proposed in order to improve motion estimation. The proposed protocol is versatile and straightforward. It is compatible with all MRI systems and may provide insights on the origins of specific motion artifacts. The MRI and artificial intelligence research communities could benefit from this work to build in-vivo labeled datasets of motion-corrupted MRI images suitable for training/testing any retrospective motion correction or machine learning algorithm.


Introduction
MRI is an essential imaging modality in medicine, which suffers unfortunately from a great sensitivity to movements that can deteriorate image quality. In general, relatively long acquisition times (in the order of minutes) are required to obtain high resolution structural MRI images. Hence, there are situations where motion during scan is unavoidable.
Many researchers have devoted their efforts towards motion impact mitigation by reducing acquisition times or by designing new motion insensitive a.k.a. ''motion robust'' pulse sequences as described in [1]. The latter can leverage non-Cartesian k-space sampling strategies (e.g., spirals, radial), such as the PROPELLER [2] or SNAILS [3] methods, which usually over-sample the center of k-space.
Several methods have been proposed for motion-tracking that use either external devices or navigators [4]. Widespread systems are based * Correspondence to: University of Geneva, route de Drize 7, 1227 Carouge, Switzerland.
E-mail address: oscar.dabrowski@unige.ch (O. Dabrowski). on cameras coupled with special markers [5,6]. One main drawback of such methods is the need for calibration, which can be quite challenging [7,8] especially with ''out-of-bore'' camera systems [9]. An alternative to visual systems is the use of electro-magnetic (EM) pickup coils [10,11]. These methods require modifications of the pulse sequence to generate gradient pulses needed for the calibration of the device in the reference frame of the MRI system [12]. Navigator echoes offer an attractive means for either prospective or retrospective motion correction [13,14], as they use the MRI intrinsic coordinate system. In particular, the FIDNav technique can be used, which leverages multi-coil antennae to obtain motion information [15]. Much like EM tracking, these navigator-modified sequences increase scan duration and the method may not be compatible with every type of pulse sequence.  Regardless of the chosen method, whether calibration is done by choreographed head movements or by simulation, there is a need to obtain controlled head motion acquisitions together with reproducibility validation for motion correction quality assessment [16]. However, to the best of our knowledge, no dedicated system is available for the investigation of controlled motion impact on MRI scans in a straightforward, inexpensive and versatile (i.e., compatible with any scanner or pulse sequence, provided that a screen is available to display the visual stimulus) manner.
Motion correction is a complex problem, which depends on many parameters such as pulse sequence, type of motion (where it occurs in k-space, abrupt vs progressive motion, etc.) and also on the motion detection/estimation methods. Therefore, it is often difficult to know the exact link between an observed artifact and a given motion [17]. So far, researchers in the field have mostly focused on ''free motion'' and there is a need for labeled motion-controlled datasets acquired within ranges that are most likely to occur during clinical scans. This kind of labeled data should allow the evaluation of motion correction techniques in a more tractable and systematic manner.
We propose a general and straightforward method allowing reproducible head choreographies to be performed by subjects inside any MRI scanner equipped with a video display system. The choreographic control is based on the visual pursuit of a target solely by the subject's head displacement. The performance of the controlled motion system is validated by estimating motion during a fast time series acquisition. In practice, each subject undergoes a target-tracking training session during a fast echo planar imaging (SMS-EPI) [18,19] acquisition, from which subject motion is estimated by rigid-body volume registration. Then, the same type of motion is performed during standard structural brain imaging in order to create corrupted images based on controlled head motion. Any sequence can be used to produce corrupted images with known motion patterns, provided that the choreography control video is synchronized with the start of the sequence and the availability of an MRI-compatible screen (as in a standard fMRI setup).
Previous work has shown that abrupt head motion can be as fast as 200 ms [20] and EPI repetition times are usually of the order of the second. In order to improve the accuracy of motion estimation of EPI time series, especially with rapid motion occurring within EPI repetition time (TR), a spatio-temporal upsampling and interpolation method is proposed. It leverages the interleaved simultaneous multislice (SMS) scheme used in conjunction with EPI. Methods leveraging the SMS scheme have been proposed, such as [18], but the usage of Kalman filtering may hinder detection of fast abrupt motion and no implementation is freely available. We provide a Matlab implementation for our method at: https://gitlab.unige.ch/Oscar.Dabrowski/te mporally-improved-volume-registration-by-data-driven-subvolume-inte rpolation-of-simultaneous-multislice-epi-time-series.
Our motion-control method enables the creation of labeled datasets of motion-corrupted MRI images that could be used for motion correction algorithm benchmarking or serving as the basis for motion artifact simulation, for a better understanding of the phenomenon. The protocol is designed to allow reproducible motion artifacts within a known range of parameters and following a precise timing. Among many applications, the dataset obtained following this procedure may be used as a test set to assess the performance of any retrospective motion correction technique.

System setup
Motion instructions were provided to the subjects via an LCD screen placed at the back of the MRI and viewed by a mirror mounted on the standard head coil. Subjects wore a modified pair of plastic glasses to restrict their field-of-view (Fig. 1). The idea of motion tracking with this equipment is to mimic the same principle as following a moving target with binoculars or a spyglass. One eye is masked, and for the other, a pinhole forces an eye-pinhole-marker alignment. It has been shown that head and eye motion both contribute to tracking during a visual pursuit [19]. Using glasses with a sight (masking crosshair marker), naturally constrains motion to the head only, removing any eye tracking contribution. This simple device alleviates the experiment from any variation in instruction comprehension or implementation such as subject ability to not move the eyes while tracking.
To generate a suitable target pursuit movie for the head choreography, the distance in pixels on the screen is expressed as a function of the head angle with respect to the rest position as shown in Fig. 1: where = + ′ is the distance between the subject's eye and screen, is the screen width in centimeters and is the horizontal resolution in pixels.
The impact of the distance measurement inaccuracy is clearly negligible compared to the subject's angle reproduction precision (human error), considering that a ±1 cm distance error translates to an error of approximately ±0.0065 degrees.
As a proof of concept of the system, a choice had to be made with respect to the motion parameters and pulse sequences to use. The widely used 2D (T1-w-SE) and 3D (T1-w MP-RAGE) sequences were chosen as well as EPI for motion estimation. Two motion types were chosen: ''abrupt'' (fast) and ''motion and hold'' (slower), with 3 different angles ∈ {1 • , 2 • , 3 • } representative of the range of motion which may occur during a clinical scan.
Due to the restricted field of view, all relevant information regarding head choreographies was displayed in the central region delineated  by the pinhole (Fig. 1). Head choreography movies were designed with specific timings corresponding to fast abrupt and slower motion and hold motion types in up/down and right/left directions. A set of symbols shown in Fig. 1 informs the subjects on which movements to perform. A countdown timer is displayed during rest periods and two seconds before motion onset, a crosshair appears on the targeted angle to avoid a ''surprise'' effect, which could potentially lead to ''overshooting''. To obtain accurate timings for the occurrence of motion artifacts, the video must start synchronized with the beginning of the acquisition sequences. The MRI scanner device-specific sync setup used here is described in Appendix D.

Subject preparation
Before sending volunteers inside the MRI scanner, their interpupillary distance is checked to match with the glasses pinhole (adjustments were rarely needed). Then, subjects are placed on the MRI table and equipped with a mirror-mounted head coil. Glasses vertical position is adjusted on subject's faces and the glasses are taped on their forehead and temples. Finally, subjects are sent to the isocenter, and final left/right adjustments of the screen is performed following their feedback.

Experimental protocol
All experiments were performed at University Hospitals of Geneva on a 3T MRI system (MAGNETOM Prisma fit, Siemens Healthineers, Erlangen, Germany) with a 20-channel receiver head coil. Six volunteers participated in the study.
After the acquisition of no-motion T1-w Spin-Echo and T1-w 3D MP-RAGE sequences for reference, the protocol consists of an EPI (training) sequence used for motion estimation. Spin-Echo and T1 3D MP-RAGE sequences were then acquired for each type of motion.

Sub-TR volume-to-volume registration for accurate abrupt motion estimation
Two different situations may arise for practitioners using the proposed protocol. Either the motion is slow or it is fast with respect to TR period. In the latter case, motion will be undersampled, therefore misestimated. The phenomenon can be understood in a similar wayalthough not strictly identical -than the Nyquist-Shannon theorem that requires a sampling period (TR) being two times shorter than the fastest motion period to be estimated. In our experiments TR = 600 ms, and fast head motion ''period'' was estimated to be around 800 ms, hence a TR = 400 ms would be needed at least. In the current application, we performed a threefold increase in temporal upsampling, which would approximately correspond to TR = 200 ms.
The goal of EPI acquisitions is to estimate head motion in relation with the instructions displayed on the screen. As just mentioned, rapid motion may be too fast to be properly sampled by EPI with TR = 600 ms, which leads to its underestimation. In order to have more accurate motion estimation of rapid movements, the SMS scheme can be leveraged to perform temporal upsampling of the EPI acquisitions, which, in turn, leads to spatial undersampling which can be dealt with interpolation.
The method is based on splitting entire EPI acquisitions into multiple time series (TS) according to the slice ordering of the SMS acquisition. For the proof of concept presented here, 36 slices/volume were acquired during each TR, with an SMS factor of 4. We separated the time series into 3 TS (TS1, TS2, TS3) corresponding to 12 slices per volume. This choice represents a good compromise between spatial and temporal resolution. Since incomplete spatial coverage impairs realignment algorithms, each subvolume was spatially interpolated. The proposed interpolation algorithm computes optimal affine pairwise transforms between adjacent slices, i.e., from slice to slice + 1 and from slice to slice − 1, referred to as ''forward'' and ''backward'', respectively. The interpolation is obtained by weighted averages of forward and backward transformed slices in a given subvolume according to hand-crafted weight matrices constructed for each time series. The latter matrices specify the location of existing slices and the weights to use. A qualitative example of interpolation is provided in Fig. 2. Each time series was processed separately. They were each realigned with respect to their first -fully interpolated -corresponding volume with SPM [21]. The latter uses an ''intensity-based'' registration algorithm, i.e., based on a MSE metric between volumes.
A fully upsampled (×3) time series can be reconstructed by temporally interleaving the motion parameter vectors. A zoom on two motion blocks shows a comparison of the realignment results between the original (non-upsampled, non-interpolated) acquisitions and the same acquisitions processed with our method (Fig. 3). Rigid-body volume registration was performed with respect to a temporally close volume to avoid baseline offset between time series that results in apparent oscillation of the timeline as observed in Fig. 3(a). In practice motion estimation is performed by considering each time series independently and is not affected by the choice of the reference volume.

Data postprocessing
The interpolation and upsampling method described previously was applied to each subject's EPI data and motion parameters were obtained for each time series separately.
Motion parameters were computed considering local baselines, i.e., resting positions before each motion onset (Fig. 3). Ideally, the baselines should be at zero degrees, but unavoidable variations do occur due to deviations in the spatial content of the slices, when registration is performed with respect to the initial volume (instead of a temporally closer volume, as mentioned in Section 2.4). For the ''abrupt'' motion type, the maximum peak value among each time series was considered. For the ''motion and hold'' type, the maximum average value of the ''plateau'' points, among each time series, was taken.
Descriptive statistics are presented in the form of boxplots and curves displaying signed errors between estimated angles with respect to target values. Boxplots aggregate all subjects' data together to describe overall error whereas curves describe the temporal error-wise behavior of each subject. Both boxplots and curves take baselines into account for error computation.
Representative artefacted Spin-Echo were obtained from raw multicoil k-space combination using the ESPIRiT algorithm [22]. Representative MP-RAGE DICOM images are also provided to allow qualitative estimation of motion artifacts.

Dataset labeling
The screen defines a 2D projection of the corresponding 6 DoF that subjects are performing. Motion estimation obtained by rigid-body volume registration of EPI training sessions gives empirical labels for acquisitions performed thereafter. Data labeling can be done by extracting the corresponding average motion parameters from the training acquisition once subjects are trained, (after 2.5 min, see results). Hence, in the context of training a supervised deep learning model [23], it is possible to build a labeled dataset { ( ) , ( ) }, where is the dataset element index, and where each MRI image ( ) is associated with its empirical target ( ) ∈ R ×6 , a matrix of 6 motion parameters for each of the phase encoding lines (e.g., in the case of a classical Spin-Echo acquisition).

Choreography training sessions and motion estimation
For each participant, the experiment begins with a target tracking training session during repeated SMS EPI acquisitions that allows the investigation of subjects' spatio-temporal motion accuracy using rigidbody registration, as well as their learning behavior. The EPI training sessions also provides the labels for the dataset of corrupted MRI images. As described in the last methods subsection, the stimulus parameter space (2D) maps to the spatial parameter space of a subject's head (6-DoF), that corresponds to the ''response function'' for each subject. This includes the theoretical 6 DoF motion as well as any experimental fluctuations (e.g., internal constraints such as subject neck/back pain as well as external parameters such as excessively inflated air cushion padding, etc.). In this work, we rely on intra-session repeatability to ensure consistent motion control that can be applied reliably in subsequent structural MRI acquisitions. Thus, the training session provides the real motion labels for the data, which are more accurate than ''theoretical'' target motion labels. Consequently, our approach aims at accurately estimating the motion even if the subject did not perform the motion accurately (as long as the subject performance is consistent within the session). The EPI training session is therefore required for each subject and each session not only to validate motion reproduction but also to obtain the empirically labeled dataset we aim at. Fig. 4 shows rotational motion estimations for a given subject during a training session after the corresponding upsampled and interpolated EPI acquisition has been realigned with SPM. The first row represents the targeted movements that should be performed by the subject.
Qualitatively, we observe that the subjects motion patterns match the targets well. The ''Up/Down'' motion angle amplitudes are generally smaller than the ''Right/Left'' ones (Figs. 4 and 5). According to subject's feedback, this can be explained by the ''Up/Down'' motion being more difficult to perform accurately.
The other two rows show the effective movements made by the subject as estimated by rigid-body EPI volume registration. Volumes acquired during each EPI sequence are realigned with respect to the first volume to obtain rigid-body motion parameters allowing subject motion estimation. It can be observed that the volunteer (S2) ''overshoots'' the target angles at the beginning of the acquisition but stabilizes relatively fast, i.e., after about the first two motion blocks (''motion and hold'' and ''abrupt''). To prevent potential overshooting, a countdown timer is displayed 2 s before each motion starts and a crosshair appears on the targeted angle. This ''anticipation timing performance'' process proved to improve time response and motion accuracy [24].

Intra-subject motion repeatability and baseline estimation
The starting position is by convention the zero-degree position. It can be observed that the baseline fluctuates over time because subjects  need to get accustomed to the expected dot position and speed. After about one trial of each motion type, subjects tend to reach a more stable baseline. Hence, they undergo a training phase. Baseline variation factors may include slight position readjustments during the experiment (e.g., if the subject felt uncomfortable or swallowing, coughing, etc.).
Different deviations from the ideal baseline can be observed among different subjects. However, it can be observed that the baseline offsets within each group of (fast) abrupt movements are relatively constant (Fig. 4, frame a and b). When the duration between motion events is longer, i.e., in motion and hold blocks, the offset varies more. This observation could be explained by proprioceptive memory that is known to allow subjects to retain accurate motion memory during a repetitive movement reproduction task, when it is performed within short delays between trials [25]. Therefore, to obtain a reliable angle amplitude measurement, we estimate a local baseline before each motion onset. The actual angles are computed by taking the maximum peak amplitude for abrupt motion and the mean angle amplitude for sustained motion with respect to the local baseline for each time series (Fig. 3).

Temporal resolution increase by volume splitting and fast motion estimation
Taking the maximum angle value is justified because temporal upsampling removes mixtures of different movements occurring within one TR by splitting them into separate time series. Therefore, the highest peak is the best motion estimation available.
Abrupt peak detection and mean angle computation between motion and hold peaks was performed automatically with a sliding window approach (similar to some well-known peak-detection algorithms as in [26]), but with prior knowledge of the target, which allowed us to directly place the search window around the correct positions and avoid outliers.
With temporal upsampling and spatial interpolation, a significant abrupt peak amplitude improvement is obtained as shown in Fig. 3b. The typical amplitude gain is shown. As expected, motion amplitude estimation for ''motion and hold'' sequences does not change much with upsampling, since head positions are maintained for periods longer than the duration of one TR (acquisition of one volume). We recall that the goal of temporal upsampling is to improve the estimation of fast motion (occurring while a volume is acquired), which, in turn, allows to validate motion reproduction.

Subject motion accuracy and reproducibility
It is important to point out that we care about intra-session reproducibility, i.e., that a subject is able to perform consistently within a session, since the motion pattern estimated by EPI time series realignment is applied to structural imaging performed during the same session. It likely that the response function cannot be applied in another session, since the estimated motion parameters even for the same subject is prone to change significantly due to experimental conditions (e.g., different FoV, and other parameters mentioned in Section 3.1).
In Figs. 5 and 6, errors with respect to the target are reported by type of motion (U/D or R/L, abrupt or motion&hold) and by target angle amplitude. In Fig. 5 the data acquired for all subjects is aggregated in each boxplot to show the error distributions for each target angle and motion type. The first measurement for each subject was discarded due to the effect of transient adaptation to the task. Hence, each boxplot represents the distribution of 6 subjects * 4 measurements = 24 data points per target angle and per motion type. In Fig. 6 the 5 measurements (i.e., including the first) for each motion block (abrupt or motion&hold) are shown separately. Each color corresponds to one subject. Measures are connected by a line to make the graph easier to read. The dashed line is the average measure for each subject at each point and the gray shaded area shows one standard deviation above and below the mean. The error is computed by taking local baselines into account, for each subject. It is the difference between the target angle and the effective angle. Each row shows results for the same type of motion and each column corresponds to a given angle.
For all subjects, motion accuracy error increases with angle amplitude. It can also be observed that subjects were able to quickly generalize the learning process. Considering S2: we can infer that there was a ''strong learning effect'' during the first two Right/Left motion blocks (Fig. 4). Afterwards, subject angle accuracy is quite stable even for the first Up/Down motion blocks, although there is a slight improvement near the end of the acquisition. Moreover, rapid convergence can be observed in Fig. 6, mostly in the first row, i.e., motion&hold in Right/Left direction, as the standard deviation decreases quickly. That direction (Right/Left) corresponds to the first training session and suggests that most of the learning was achieved there. Indeed, in the third row, i.e., for motion&hold in the Up/Down direction, no such phenomenon is visible, and the standard deviation is very low. This leads us to the conclusion that subjects are able to generalize well the task to perform from only a few training blocks. Therefore, other researchers interested in this method can reduce training time and perform a single training session with a mixture of the different motion types, according to their needs.

Generating motion artifacted images
Representative images acquired with the proposed protocol are presented in Fig. 7 to illustrate the type of motion artifacts produced. The angles used for head choreographies were chosen empirically, based on the range of angles observed in clinical exams, and arbitrarily set to three angles covering this range, namely 1,2, and 3 degrees. The assumption being that it is unrealistic for a given subject -except in young pediatric patients -to move much more than this in a padded head coil. The proposed head choreographies used the same motion patterns than the EPI training sessions.
Classical T1 Spin-Echo is widely used in clinical settings for Gduptake scans. Hence, studying such kind of pulse sequence presents a practical interest and has the advantage of following a simple sequential and Cartesian k-space sampling scheme. Examples of artifacts produced during a 2D Spin-Echo acquisition are shown in Fig. 7. Motion artifacts are qualitatively similar for all subjects and take the form of ''ringing artifacts'' in the phase encoding direction (R/L).
An example of empirical labeling with 6 DoF is shown in Fig. 8. The average k-space magnitude over all slices of the corresponding acquisition (Spin-Echo) is shown next to the label to highlight the corrupted regions. Motion impacts mostly the phase-encoding direction, since this is the slowest one to acquire. The effect of motion can be seen as dark lines appearing in k-space magnitude at phase encoding lines corresponding to motion onset. This well-known artifact [27] depends on the position of affected lines in k-space as well as the type of motion.
The 3D MP-RAGE acquisition presents more complex patterns as it uses two phase encoding directions, namely phase encoding per se (A/P) and slice encoding (R/L). Therefore, visual inspection of kspace magnitude is not as informative as for a classical Spin-Echo. Representative examples of motion artifacts observed in 3D T1 MP-RAGE acquisitions are shown in Fig. 7. Mild ringing artifacts can be observed in the frontal regions of the sagittal and axial slices (see Fig. 7. Sample of acquired Spin echo (axial) artifacts (a, b, c) and MP-RAGE (coronal) (d, e, f). It can be observed that artifacts are qualitatively similar (ringing/ghosting with similar spacing) and clearly visible for all subjects, which is what we aimed to achieve. MP-RAGE artifacts are milder than those observed on Spin echo acquisitions. Appendix) and towards the upper right parts on the coronal views. The artifacts are qualitatively similar for all subjects, which suggests that head choreographies could be well reproduced. In addition to being visually similar, the artifacts appear in the expected regions. The proposed method is usable as such for repetitive movements because it can be considered as an ''anticipation timing performance'' task and subjects response time can be neglected [24]. Our system would allow generating head choreography tracking movies based on real motion data (estimated from EPI sequences or by other means) but in that case, subjects' temporal accuracy should be measured.
To summarize, volume registration confirms correct head choreography reproduction and provides empirical labels for the dataset. The same choreography performed by different subjects resulted in qualitatively similar artifacts, indicating that the method may be useful in artifact simulation validation. Our experimental protocol providing well-defined motion patterns and amplitudes will help to better understand the effects of motion on MRI images and will provide a practical basis to compare or validate motion models. The method can easily be extended to more complex motion patterns according to specific needs.

Limitations of the system
The proposed approach is inherently limited to head motion. The screen is a 2D projection of the 6-DoF movement of a subject's head. Left/right or up/down motion on screen will be translated to a combination of these 6-DoF (mainly yaw and pitch, respectively). Roll motion is difficult to reproduce accurately and would require adaptations of the system. A rotating crosshair could be displayed on the moving dot and customized optics could be adjusted on the glasses.   It would not be practically feasible to constrain a subject to do pure translations, but this type of motion is not realistic in the scope of a clinical scan, except for a slight translation in the axis of the scanner ( axis), most likely due to subject neck relaxation in the beginning of the scan (cf. Appendix C).

Conclusion
A versatile, straightforward and inexpensive methodology to control head movements in an MRI scanner is proposed. A method to improve registration in case of motion faster than TR has been developed. The apparatus and protocols provided allows systematic and reproducible O. Dabrowski et al. motion-controlled experiments, which, in turn, opens means to better understand the effects of motion and provides a practical basis for simulation. The method can also be used for generating realistic motion-corrupted datasets suitable for benchmarking any retrospective motion-correction technique or training supervised machine learning algorithms.
The proposed methodology can be adapted by researchers based on their needs. For example, a different range of angles and motion types could be investigated depending on the application. In particular, it can help to study the impact of motion on different k-space regions.