Advertisement

A review of deep learning based methods for medical image multi-organ segmentation

      Highlights

      • Comprehensive review of deep learning-based multi-organ segmentation.
      • Categorization of pixel-wise classification and end-to-end segmentation.
      • Pixel-wise classification includes AE and CNN.
      • End-to-end segmentation includes FCN, R-FCN, GAN and synthetic image-aided.
      • Benchmark of algorithms’ performances for thoracic and head-neck CT segmentation.

      Abstract

      Deep learning has revolutionized image processing and achieved the-state-of-art performance in many medical image segmentation tasks. Many deep learning-based methods have been published to segment different parts of the body for different medical applications. It is necessary to summarize the current state of development for deep learning in the field of medical image segmentation. In this paper, we aim to provide a comprehensive review with a focus on multi-organ image segmentation, which is crucial for radiotherapy where the tumor and organs-at-risk need to be contoured for treatment planning. We grouped the surveyed methods into two broad categories which are ‘pixel-wise classification’ and ‘end-to-end segmentation’. Each category was divided into subgroups according to their network design. For each type, we listed the surveyed works, highlighted important contributions and identified specific challenges. Following the detailed review, we discussed the achievements, shortcomings and future potentials of each category. To enable direct comparison, we listed the performance of the surveyed works that used thoracic and head-and-neck benchmark datasets.

      Introduction

      Medical image segmentation is one of the most important medical image analysis tasks. It has a wide range of applications in imaging systems such as microscopy, X-ray, ultrasound, computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET). Medical image segmentation plays an essential role in radiotherapy, which is the standard care for certain cancers [

      Liao ZX, Lee JJ, Komaki R, Gomez DR, O'Reilly MS, Fossella FV, et al. Bayesian Adaptive Randomization Trial of Passive Scattering Proton Therapy and Intensity-Modulated Photon Radiotherapy for Locally Advanced Non-Small-Cell Lung Cancer (vol 36, pg 1813, 2018). Journal of Clinical Oncology. 2018;36:2570-.

      ]. The success of radiotherapy depends highly on accurate irradiation to the target and sparing of organs-at-risk (OARs) [
      • Molitoris J.K.
      • Diwanji T.
      • Snider J.W.
      • Mossahebi S.
      • Samanta S.
      • Badiyan S.N.
      • et al.
      Advances in the use of motion management and image guidance in radiation therapy treatment for lung cancer.
      ,
      • Vyfhuis M.A.L.
      • Onyeuku N.
      • Diwanji T.
      • Mossahebi S.
      • Amin N.P.
      • Badiyan S.N.
      • et al.
      Advances in proton therapy in lung cancer.
      ]. Therefore, accurate structure delineation is crucial for radiotherapy, especially for highly conformal radiotherapy such as intensity modulated radiotherapy (IMRT), proton therapy and stereotactic body radiotherapy (SBRT). These highly conformal treatments are designed to shape radiation to target volume while sparing healthy OARs, and are usually planned with sharp dose drop-off. Misdelineation of anatomical structures could result in severe misadministration of radiation doses to the target and OAR. In current clinical practice, structure contours are manually delineated by physicisans. The manual contouring process is tedious, time consuming and laborious. Manual delineation of soft tissues on CT images is challenging due to low soft tissue contrast, which makes the contours prone to errors and inter/intra-observer variabilities [
      • Hurkmans C.W.
      • Borger J.H.
      • Pieters B.R.
      • Russell N.S.
      • Jansen E.P.
      • Mijnheer B.J.
      Variability in target volume delineation on CT scans of the breast.
      ,
      • Rasch C.
      • Steenbakkers R.
      • van Herk M.
      Target definition in prostate, head, and neck.
      ,
      • Van de Steene J.
      • Linthout N.
      • de Mey J.
      • Vinh-Hung V.
      • Claassens C.
      • Noppen M.
      • et al.
      Definition of gross tumor volume in lung cancer: inter-observer variability.
      ,
      • Vinod S.K.
      • Jameson M.G.
      • Min M.
      • Holloway L.C.
      Uncertainties in volume delineation in radiation oncology: A systematic review and recommendations for future studies.
      ,
      • Breunig J.
      • Hernandez S.
      • Lin J.
      • Alsager S.
      • Dumstorf C.
      • Price J.
      • et al.
      A System for Continual Quality Improvement of Normal Tissue Delineation for Radiation Therapy Treatment Planning.
      ,
      • Nelms B.E.
      • Tome W.A.
      • Robinson G.
      • Wheeler J.
      Variations in the Contouring of Organs at Risk: Test Case from a Patient with Oropharyngeal Cancer.
      ]. In the past decades, researchers have spent enormous effort to develop automatic contouring methods for accurate and consistent organ delineation.
      Traditional medical image segmentation [
      • Chen X.
      • Pan L.
      A Survey of Graph Cuts/Graph Search Based Medical Image Segmentation.
      ,
      • Naqa I.E.
      • Yang D.
      • Apte A.
      • Khullar D.
      • Mutic S.
      • Zheng J.
      • et al.
      Concurrent multimodality image segmentation by active contours for radiotherapy treatment planning.
      ,
      • Pratondo A.
      • Chui C.
      • Ong S.
      Robust Edge-Stop Functions for Edge-Based Active Contour Models in Medical Image Segmentation.
      ,
      • Tsai A.
      • Yezzi A.
      • Wells W.
      • Tempany C.
      • Tucker D.
      • Fan A.
      • et al.
      A shape-based approach to the segmentation of medical imagery using level sets.
      ,
      • Dong X.
      • Lei Y.
      • Tian S.
      • Liu Y.
      • Wang T.
      • Liu T.
      • et al.
      Air, bone and soft-tissue Segmentation on 3D brain MRI Using Semantic Classification Random Forest with Auto-Context Model.
      ] usually involves handcrafted image feature detection such as line/edge detection and mathematical models to trace image gradient along object boundaries such as graph cuts, active contours, level-set and so on. Atlas-based method is another commonly used approach for automatic segmentation [
      • Isgum I.
      • Staring M.
      • Rutten A.
      • Prokop M.
      • Viergever M.A.
      • van Ginneken B.
      Multi-Atlas-Based Segmentation With Local Decision Fusion-Application to Cardiac and Aortic Segmentation in CT Scans.
      ,
      • Aljabar P.
      • Heckemann R.A.
      • Hammers A.
      • Hajnal J.V.
      • Rueckert D.
      Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy.
      ,
      • Iglesias J.E.
      • Sabuncu M.R.
      Multi-atlas segmentation of biomedical images: A survey.
      ,
      • Yang X.
      • Rossi P.
      • Ogunleye T.
      • Marcus D.M.
      • Jani A.B.
      • Mao H.
      • et al.
      Prostate CT segmentation method based on nonrigid registration in ultrasound-guided CT-based HDR prostate brachytherapy.
      ]. The atlas-based methods propagate the predefined structure contours to the images to be segmented using image registration. The segmentation accuracy of this technique highly relies on the accuracy of the image registration. Model-based methods which make use of statistical shape models for automated segmentation have also been proposed [
      • Ecabert O.
      • Peters J.
      • Schramm H.
      • Lorenz C.
      • von Berg J.
      • Walker M.J.
      • et al.
      Automatic model-based segmentation of the heart in CT images.
      ,
      • Qazi A.A.
      • Pekar V.
      • Kim J.
      • Xie J.
      • Breen S.L.
      • Jaffray D.A.
      Auto-segmentation of normal and target structures in head and neck CT images: A feature-driven model-based approach.
      ,
      • Sun S.H.
      • Bauer C.
      • Beichel R.
      Automated 3-D Segmentation of Lungs With Lung Cancer in CT Data Using a Novel Robust Active Shape Model Approach.
      ]. The accuracy of those methods depends on the reliability and generalizability of the models. Models that are built based on normal anatomical structures have shown limited success on irregular structure segmentation.
      Recently, machine learning (ML) has gained a lot of interest in medicine [
      • Beam A.L.
      • Kohane I.S.
      Translating Artificial Intelligence Into Clinical Care.
      ,
      • Pella A.
      • Cambria R.
      • Riboldi M.
      • Jereczek-Fossa B.A.
      • Fodor C.
      • Zerini D.
      • et al.
      Use of machine learning methods for prediction of acute toxicity in organs at risk following prostate radiotherapy.
      ,
      • Yang X.
      • Wu N.
      • Cheng G.
      • Zhou Z.
      • David S.Y.
      • Beitler J.J.
      • et al.
      Automated segmentation of the parotid gland based on atlas registration and machine learning: a longitudinal MRI study in head-and-neck radiation therapy. International Journal of Radiation Oncology* Biology*.
      ]. Artificial Neural Network (ANN), a subfield of ML, utilizes multiple layers of connected neurons with learnable weights and biases to simulate human brains to accomplish high-level tasks [
      • Bryce T.J.
      • Dewhirst M.W.
      • Floyd C.E.
      • Hars V.
      • Brizel D.M.
      Artificial neural network model of survival in patients treated with irradiation with and without concurrent chemotherapy for advanced carcinoma of the head and neck.
      ,
      • Gulliford S.L.
      • Webb S.
      • Rowbottom C.G.
      • Corne D.W.
      • Dearnaley D.P.
      Use of artificial neural networks to predict biological outcomes for patients receiving radical radiotherapy of the prostate.
      ,
      • Tomatis S.
      • Rancati T.
      • Fiorino C.
      • Vavassori V.
      • Fellin G.
      • Cagna E.
      • et al.
      Late rectal bleeding after 3D-CRT for prostate cancer: development of a neural-network-based predictive model.
      ,
      • Chen S.F.
      • Zhou S.M.
      • Zhang J.N.
      • Yin F.F.
      • Marks L.B.
      • Das S.K.
      A neural network model to predict lung radiation-induced pneumonitis.
      ,
      • Su M.
      • Miften M.
      • Whiddon C.
      • Sun X.J.
      • Light K.
      • Marks L.
      An artificial neural network for predicting the incidence of radiation pneumonitis.
      ,
      • Ochi T.
      • Murase K.
      • Fujii T.
      • Kawamura M.
      • Ikezoe J.
      Survival prediction using artificial neural networks in patients with uterine cervical cancer treated by radiation therapy alone.
      ]. Deep Learning (DL) is a new term for ANN arising from advances in the ANN architectures and algorithms since 2006, referring to ANN with many hidden layers. Since there is no consensus on the number of layers required to be count as deep, the distinction between ANN and DL is not clearly defined [
      • Boldrini L.
      • Bibault J.E.
      • Masciocchi C.
      • Shen Y.T.
      Bittner MI.
      ]. DL has demonstrated enormous potential in computer vision [
      • LeCun Y.
      • Bengio Y.
      • Hinton G.
      Deep learning.
      ]. DL uses a data-driven approach to explore vast image features to facilitate various vision tasks, such as image classification [
      • Krizhevsky A.
      • Sutskever I.
      • Hinton G.E.
      ImageNet Classification with Deep Convolutional Neural Networks.
      ], object detection [
      • Sermanet P.
      • Eigen D.
      • Zhang X.
      • Mathieu M.
      • Fergus R.
      • OverFeat L.Y.
      Integrated Recognition, Localization and Detection using Convolutional Networks.
      ] and segmentation [
      • Shelhamer E.
      • Long J.
      • Darrell T.
      Fully Convolutional Networks for Semantic Segmentation.
      ]. Inspired by the success of DL in computer vision, researchers have proposed various methods to extend the use of DL techniques to medical imaging. To date, DL has been extensively studied in medical image segmentation [
      • Hesamian M.H.
      • Jia W.
      • He X.J.
      • Kennedy P.
      Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges.
      ,
      • Zhou T.
      • Ruan S.
      • Canu S.
      A review: Deep learning for medical image segmentation using multi-modality fusion.
      ,

      Lei Y, Wang T, Tian S, Dong X, Jani AB, Schuster D, et al. Male pelvic multi-organ segmentation aided by CBCT-based synthetic MRI. Phys Med Biol. 2019;in press, doi: 10.1088/1361-6560/ab63bb.

      ,
      • Dong X.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Patel P.
      • Curran W.J.
      • et al.
      Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network.
      ,

      Lei Y, Dong X, Tian Z, Liu Y, Tian S, Wang T, et al. CT prostate segmentation based on synthetic MRI-aided deep attention fully convolution network. Med Phys. 2019;in press, doi: 10.1002/mp.13933.

      ,
      • Lei Y.
      • Dong X.
      • Wang T.
      • Higgins K.
      • Liu T.
      • Curran W.J.
      • et al.
      Whole-body PET estimation from low count statistics using cycle-consistent generative adversarial networks.
      ,
      • van der Heyden B.
      • Wohlfahrt P.
      • Eekers D.B.P.
      • Richter C.
      • Terhaag K.
      • Troost E.G.C.
      • et al.
      Dual-energy CT for automatic organs-at-risk segmentation in brain-tumor patients using a multi-atlas and deep-learning approach.
      ,
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      ,
      • Wang T.
      • Lei Y.
      • Tian S.
      • Jiang X.
      • Zhou J.
      • Liu T.
      • et al.
      Learning-based automatic segmentation of arteriovenous malformations on contrast CT images in brain stereotactic radiosurgery.
      ,
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ,
      • Wang B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Liu Y.
      • Patel P.
      • et al.
      Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation.
      ,
      • Lei Y.
      • Wang T.
      • Wang B.
      • He X.
      • Tian S.
      • Jani A.B.
      • et al.
      Ultrasound prostate segmentation based on 3D V-Net with deep supervision. SPIE Medical.
      ,
      • Wang B.
      • Lei Y.
      • Wang T.
      • Dong X.
      • Tian S.
      • Jiang X.
      • et al.
      Automated prostate segmentation of volumetric CT images using 3D deeply supervised dilated FCN. SPIE Medical.
      ,
      • Wang T.
      • Lei Y.
      • Tang H.
      • Harms J.
      • Wang C.
      • Liu T.
      • et al.
      A learning-based automatic segmentation method on left ventricle in SPECT imaging. SPIE Medical.
      ,
      • Wang B.
      • Lei Y.
      • Jeong J.J.
      • Wang T.
      • Liu Y.
      • Tian S.
      • et al.
      Automatic MRI prostate segmentation using 3D deeply supervised FCN with concatenated atrous convolution. SPIE Medical.
      ,
      • Lei Y.
      • Liu Y.
      • Dong X.
      • Tian S.
      • Wang T.
      • Jiang X.
      • et al.
      Automatic multi-organ segmentation in thorax CT images using U-Net-GAN. SPIE Medical.
      ,
      • Wang T.
      • Lei Y.
      • Shafai-Erfani G.
      • Jiang X.
      • Dong X.
      • Zhou J.
      • et al.
      Learning-based automatic segmentation on arteriovenous malformations from contract-enhanced CT images. SPIE Medical.
      ,

      Wang T, Lei Y, Tang H, He Z, Castillo R, Wang C, et al. A learning-based automatic segmentation and quantification method on left ventricle in gated myocardial perfusion SPECT imaging: A feasibility study. J Nucl Cardiol. 2019;In press, doi: 10.1007/s12350-019-01594-2.

      ,
      • Wu J.
      • Xin J.
      • Yang X.
      • Sun J.
      • Xu D.
      • Zheng N.
      • et al.
      Deep morphology aided diagnosis network for segmentation of carotid artery vessel wall and diagnosis of carotid atherosclerosis on black-blood vessel wall MRI.
      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Tian S.
      • Patel P.
      • Jani A.B.
      • et al.
      Pelvic multi-organ segmentation on cone-beam CT for prostate adaptive radiotherapy.
      ,
      • Jun Guo B.
      • He X.
      • Lei Y.
      • Harms J.
      • Wang T.
      • Curran W.J.
      • et al.
      Automated left ventricular myocardium segmentation using 3D deeply supervised attention U-net for coronary computed tomography angiography.
      ,
      • Lei Y.
      • Dong X.
      • Tian Z.
      • Liu Y.
      • Tian S.
      • Wang T.
      • et al.
      CT prostate segmentation based on synthetic MRI-aided deep attention fully convolution network.
      ,

      Lei Y, Fu Y, Roper J, Higgins K, Bradley JD, Curran WJ, et al. Echocardiographic Image Multi-Structure Segmentation using Cardiac-SegNet. Med Phys. 2021;(in press), doi: 10.1002/mp.14818.

      ,
      • Lei Y.
      • He X.
      • Yao J.
      • Wang T.
      • Wang L.
      • Li W.
      • et al.
      Breast tumor segmentation in 3D automatic breast ultrasound using Mask scoring R-CNN.
      ,
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and neck multi-organ auto-segmentation on CT images aided by synthetic MRI.
      ,
      • Fu Y.B.
      • Mazur T.R.
      • Wu X.
      • Liu S.
      • Chang X.
      • Lu Y.G.
      • et al.
      A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
      ,
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Tang X.
      • Jiang X.
      • et al.
      CT-based multi-organ segmentation using a 3D self-attention U-net network for pancreatic radiotherapy.
      ,
      • He X.
      • Guo B.J.
      • Lei Y.
      • Wang T.
      • Fu Y.
      • Curran W.J.
      • et al.
      Automatic segmentation and quantification of epicardial adipose tissue from coronary computed tomography angiography.
      ,
      • Jeong J.
      • Lei Y.
      • Kahn S.
      • Liu T.
      • Curran W.J.
      • Shu H.K.
      • et al.
      Brain tumor segmentation using 3D Mask R-CNN for dynamic susceptibility contrast enhanced perfusion imaging.
      ,
      • Lei Y.
      • Wang T.
      • Tian S.
      • Dong X.
      • Jani A.B.
      • Schuster D.
      • et al.
      Male pelvic multi-organ segmentation aided by CBCT-based synthetic MRI.
      ,
      • Fu Y.B.
      • Liu S.
      • Li H.H.
      • Yang D.S.
      Automatic and hierarchical segmentation of the human skeleton in CT images.
      ,
      • He X.
      • Guo B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Curran W.J.
      • et al.
      Thyroid gland delineation in noncontrast-enhanced CT using deep convolutional neural networks.
      ,
      • Zhang Y.
      • He X.
      • Tian Z.
      • Jeong J.J.
      • Lei Y.
      • Wang T.
      • et al.
      Multi-Needle Detection in 3D Ultrasound Images Using Unsupervised Order-Graph Regularized Sparse Dictionary Learning.
      ,

      Harms J, Lei Y, Tian S, McCall NS, Higgins K, Bradley JD, et al. Automatic Delineation of Cardiac Substructures using a Region-Based Fully Convolutional Network. Med Phys. 2021;(in press) DOI: 10.1002/mp.14810.

      ,
      • Zhang Y.
      • Tian Z.
      • Lei Y.
      • Wang T.
      • Patel P.
      • Jani A.B.
      • et al.
      Automatic multi-needle localization in ultrasound images using large margin mask RCNN for ultrasound-guided prostate brachytherapy.
      ,
      • Dai X.
      • Lei Y.
      • Zhang Y.
      • Qiu R.L.J.
      • Wang T.
      • Dresser S.A.
      • et al.
      Automatic multi-catheter detection using deeply supervised convolutional neural network in MRI-guided HDR prostate brachytherapy.
      ,
      • Zhang Y.
      • Lei Y.
      • Qiu R.L.J.
      • Wang T.
      • Wang H.
      • Jani A.B.
      • et al.
      Multi-needle Localization with Attention U-Net in US-guided HDR Prostate Brachytherapy.
      ], image synthesis [
      • Dong X.
      • Wang T.
      • Lei Y.
      • Higgins K.
      • Liu T.
      • Curran W.J.
      • et al.
      Synthetic CT generation from non-attenuation corrected PET images for whole-body PET imaging.
      ,
      • Liu Y.
      • Lei Y.
      • Wang Y.
      • Shafai-Erfani G.
      • Wang T.
      • Tian S.
      • et al.
      Evaluation of a deep learning-based pelvic synthetic CT generation technique for MRI-based prostate proton treatment planning.
      ,
      • Lei Y.
      • Wang T.
      • Harms J.
      • Fu Y.
      • Dong X.
      • Curran W.J.
      • et al.
      CBCT-Based Synthetic MRI Generation for CBCT-Guided Adaptive Radiotherapy.
      ,
      • Shafai-Erfani G.
      • Lei Y.
      • Liu Y.
      • Wang Y.
      • Wang T.
      • Zhong J.
      • et al.
      MRI-Based Proton Treatment Planning for Base of Skull Tumors.
      ,
      • Yang X.
      • Lei Y.
      • Wang T.
      • Liu Y.
      • Tian S.
      • Dong X.
      • et al.
      CBCT-guided Prostate Adaptive Radiotherapy with CBCT-based Synthetic MRI and CT.
      ,
      • Yang X.
      • Liu Y.
      • Lei Y.
      • Wang Y.
      • Shafai-Erfani G.
      • Wang T.
      • et al.
      MRI-based Proton Radiotherapy for Prostate Cancer Using Deep Convolutional Neural Networks.
      ,
      • Wang T.
      • Manohar N.
      • Lei Y.
      • Dhabaan A.
      • Shu H.K.
      • Liu T.
      • et al.
      MRI-based treatment planning for brain stereotactic radiosurgery: Dosimetric validation of a learning-based pseudo-CT generation method.
      ,
      • Liu Y.
      • Lei Y.
      • Wang Y.
      • Wang T.
      • Ren L.
      • Lin L.
      • et al.
      MRI-based treatment planning for proton radiotherapy: dosimetric validation of a deep learning-based liver synthetic CT generation method.
      ,
      • Liu Y.
      • Lei Y.
      • Wang T.
      • Kayode O.
      • Tian S.
      • Liu T.
      • et al.
      MRI-based treatment planning for liver stereotactic body radiotherapy: validation of a deep learning-based synthetic CT generation method.
      ,
      • Lei Y.
      • Harms J.
      • Wang T.
      • Liu Y.
      • Shu H.K.
      • Jani A.B.
      • et al.
      MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks.
      ,
      • Lei Y.
      • Wang T.
      • Liu Y.
      • Higgins K.
      • Tian S.
      • Liu T.
      • et al.
      MRI-based synthetic CT generation using deep convolutional neural network. SPIE Medical.
      ,
      • Shafai-Erfani G.
      • Wang T.
      • Lei Y.
      • Tian S.
      • Patel P.
      • Jani A.B.
      • et al.
      Dose evaluation of MRI-based synthetic CT generated using a machine learning method for prostate cancer radiotherapy.
      ,
      • Yang X.
      • Lei Y.
      • Wang T.
      • Patel P.R.
      • Jiang X.
      • Liu T.
      • et al.
      MRI-Based Synthetic CT for Radiation Treatment of Prostate Cancer.
      ,

      S. Charyyev T. Wang Y. Lei B. Ghavidel J. Beitler M. McDonald et al. Learning-Based Synthetic Dual Energy CT Imaging from Single Energy CT for Stopping Power Ratio Calculation in Proton Radiation Therapy 2020 Medical Physics arXiv arXiv:2005.12908.

      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Curran W.
      • Liu T.
      • Yang X.
      Medical Imaging Synthesis using Deep Learning and its Clinical Applications: A Review. arXiv.
      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Wynne J.F.
      • Curran W.J.
      • Liu T.
      • et al.
      A review on medical imaging synthesis using deep learning and its clinical applications.
      ,
      • Liu Y.
      • Lei Y.
      • Wang T.
      • Fu Y.
      • Tang X.
      • Curran W.J.
      • et al.
      CBCT-based synthetic CT generation using deep-attention cycleGAN for pancreatic adaptive radiotherapy.
      ,
      • Dai X.
      • Lei Y.
      • Fu Y.
      • Curran W.J.
      • Liu T.
      • Mao H.
      • et al.
      Multimodal MRI synthesis using unified generative adversarial networks.
      ,

      Liu R, Lei Y, Wang T, Zhou J, Roper J, Lin L, et al. Synthetic dual-energy CT for MRI-only based proton therapy treatment planning using label-GAN. Phys Med Biol. 2021;(in press) DOI: 10.1088/1361-6560/abe736.

      ,
      • Lei Y.
      • Tian Z.
      • Wang T.
      • Higgins K.
      • Bradley J.D.
      • Curran W.J.
      • et al.
      Deep learning-based real-time volumetric imaging for lung stereotactic body radiation therapy: a proof of concept study.
      ,
      • Harms J.
      • Lei Y.
      • Wang T.
      • McDonald M.
      • Ghavidel B.
      • Stokes W.
      • et al.
      Cone-beam CT-derived relative stopping power map generation via deep learning for proton radiotherapy.
      ,
      • Charyyev S.
      • Lei Y.
      • Harms J.
      • Eaton B.
      • McDonald M.
      • Curran W.J.
      • et al.
      High quality proton portal imaging using deep learning for proton radiation therapy: a phantom study.
      ,

      T. Wang Y. Lei J. Harms B. Ghavidel L. Lin J. Beitler et al. Learning-Based Stopping Power Mapping on Dual Energy CT for Proton Radiation Therapy 2020 Medical Physics arXiv arXiv:2005.12908.

      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Curran W.
      • Liu T.
      • Yang X.
      Machine Learning in Quantitative PET Imaging.
      ], image enhancement and correction [

      Dong X, Lei Y, Wang T, Higgins K, Liu T, Curran WJ, et al. Deep learning-based attenuation correction in the absence of structural information for whole-body PET imaging. Phys Med Biol. 2019;in press, doi: 10.1088/1361-6560/ab652c.

      ,
      • Wang T.
      • Lei Y.
      • Tian Z.
      • Dong X.
      • Liu Y.
      • Jiang X.
      • et al.
      Deep learning-based image quality improvement for low-dose computed tomography simulation in radiation therapy.
      ,
      • Harms J.
      • Lei Y.
      • Wang T.
      • Zhang R.
      • Zhou J.
      • Tang X.
      • et al.
      Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography.
      ,
      • Wang T.
      • Lei Y.
      • Manohar N.
      • Tian S.
      • Jani A.B.
      • Shu H.K.
      • et al.
      Dosimetric study on learning-based cone-beam CT correction in adaptive radiation therapy.
      ,
      • Lei Y.
      • Wang T.
      • Harms J.
      • Shafai-Erfani G.
      • Dong X.
      • Zhou J.
      • et al.
      Image quality improvement in cone-beam CT using deep learning. SPIE Medical.
      ,
      • Yang X.
      • Wang T.
      • Lei Y.
      • Jiang X.
      • Jani A.
      • Patel P.R.
      • et al.
      A Learning-Based Method to Improve Pelvis Cone Beam CT Image Quality for Prostate Cancer Radiation Therapy.
      ,
      • Lei Y.
      • Xu D.
      • Zhou Z.
      • Higgins K.
      • Dong X.
      • Liu T.
      • et al.
      High-resolution CT image retrieval using sparse convolutional neural network. SPIE Medical.
      ,
      • Yang X.
      • Lei Y.
      • Dong X.
      • Wang T.
      • Higgins K.
      • Liu T.
      • et al.
      Attenuation and Scatter Correction for Whole-body PET Using 3D Generative Adversarial Networks.
      ,
      • Yang X.
      • Lei Y.
      • Wang T.
      • Dong X.
      • Higgins K.
      • Curran W.J.
      • et al.
      Whole-body PET Estimation from Ultra-short Scan Durations using 3D Cycle-Consistent Generative Adversarial Networks.
      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Curran W.J.
      • Liu T.
      • Nye J.A.
      • et al.
      Machine learning in quantitative PET: A review of attenuation correction and low-count image reconstruction methods.
      ,
      • Dai X.
      • Lei Y.
      • Liu Y.
      • Wang T.
      • Ren L.
      • Curran W.J.
      • et al.
      Intensity non-uniformity correction in MR imaging using residual cycle generative adversarial network.
      ], and registration [
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Curran W.J.
      • Liu T.J.
      • Yang X.J.A.
      Deep Learning in Medical Image Registration: A Review.
      ,
      • Haskins G.
      • Kruger U.
      • Yan P.
      Deep Learning in Medical Image Registration: A Survey.
      ,
      • Lei Y.
      • Fu Y.
      • Harms J.
      • Wang T.
      • Curran W.J.
      • Liu T.
      • et al.
      4D-CT Deformable Image Registration Using an Unsupervised Deep Convolutional Neural Network.
      ,
      • Yang X.
      • Zeng Q.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Dong X.
      • et al.
      MRI-US Registration Using Label-driven Weakly-supervised Learning for Multiparametric MRI-guided HDR Prostate Brachytherapy.
      ,

      Li HM, Fan Y. Non-Rigid Image Registration Using Self-Supervised Fully Convolutional Networks without Training Data. 2018 Ieee 15th International Symposium on Biomedical Imaging (Isbi 2018). 2018:1075-8.

      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Higgins K.
      • Bradley J.
      • Curran W.
      • et al.
      LungRegNet: an unsupervised deformable image registration method for 4D-CT lung.
      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Patel P.
      • Jani A.
      • Mao H.
      • et al.
      Biomechanically constrained non-rigid MR-TRUS prostate registration using deep learning based 3D point cloud matching.
      ,
      • Fu Y.
      • Wang T.
      • Lei Y.
      • Patel P.
      • Jani A.
      • Curran W.
      • et al.
      Deformable MR-CBCT Prostate Registration using Biomechanically Constrained Deep Learning Networks.
      ,
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Liu Y.
      • Patel P.
      • Curran W.
      • et al.
      4D-CT deformable image registration using multiscale unsupervised deep learning.
      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Patel P.
      • Jani A.B.
      • Mao H.
      • et al.
      Biomechanically constrained non-rigid MR-TRUS prostate registration using deep learning based 3D point cloud matching.
      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Higgins K.
      • Bradley J.D.
      • Curran W.J.
      • et al.
      LungRegNet: An unsupervised deformable image registration method for 4D-CT lung.
      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Curran W.J.
      • Liu T.
      • Yang X.
      Deep learning in medical image registration: a review.
      ,
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Liu Y.
      • Patel P.
      • Curran W.J.
      • et al.
      4D-CT deformable image registration using multiscale unsupervised deep learning.
      ,
      • Zeng Q.
      • Fu Y.
      • Tian Z.
      • Lei Y.
      • Zhang Y.
      • Wang T.
      • et al.
      Label-driven magnetic resonance imaging (MRI)-transrectal ultrasound (TRUS) registration using weakly supervised learning for MRI-guided prostate radiotherapy.
      ]. DL-based multi-organ segmentation technique represents a significant potential in daily practices of radiation therapy since it can expedite the contouring process, improve contour accuracy and consistency and promote compliance to delineation guidelines [
      • Dong X.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Patel P.
      • Curran W.J.
      • et al.
      Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network.
      ,
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ,
      • Tong N.
      • Gou S.
      • Yang S.
      • Ruan D.
      • Sheng K.
      Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks.
      ,
      • Men K.
      • Chen X.Y.
      • Zhang Y.
      • Zhang T.
      • Dai J.R.
      • Yi J.L.
      • et al.
      Deep Deconvolutional Neural Network for Target Segmentation of Nasopharyngeal Cancer in Planning Computed Tomography Images. Frontiers.
      ,
      • Kazemifar S.
      • Balagopal A.
      • Nguyen D.
      • McGuire S.
      • Hannan R.
      • Jiang S.
      • et al.
      Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning.
      ,
      • Javaid U.
      • Dasnoy D.
      • Lee J.A.
      Multi-organ Segmentation of Chest CT Images in Radiation Oncology: Comparison of Standard and Dilated UNet.
      ,
      • Elguindi S.
      • Zelefsky M.J.
      • Jiang J.
      • Veeraraghavan H.
      • Deasy J.O.
      • Hunt M.A.
      • et al.
      Deep learning-based auto-segmentation of targets and organs-at-risk for magnetic resonance imaging only planning of prostate radiotherapy.
      ]. Furthermore, rapid DL-based multi-organ segmentation could facilitate online adaptive radiotherapy to improve clinical outcomes. After studying 80 online MRI-guided adaptive radiotherapy cases, Lamb et al. [
      • Lamb J.
      • Cao M.
      • Kishan A.
      • Agazaryan N.
      • Thomas D.H.
      • Shaverdian N.
      • et al.
      Online Adaptive Radiation Therapy: Implementation of a New Process of Care.
      ] reported the median time of adaptive process prior to beam delivery was 54 min, out of which the re-contouring process took up to 22 min. To expedite the contouring process for adaptive radiotherapy, DL-based abdominal multi-organ segmentation has been proposed and tested on Viewray MR images [
      • Fu Y.B.
      • Mazur T.R.
      • Wu X.
      • Liu S.
      • Chang X.
      • Lu Y.G.
      • et al.
      A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
      ]. Though the DL-based contour process took only several minutes, post manual correction was often needed to meet the physicians’ satisfaction. The time analysis reported in this study shows that the average contouring time, which is the automatic contouring time plus the post manual correction time, was only a quarter of the total time needed to manually contour from scratch. CT-based multi-organ segmentation which includes eight organs has been proposed for pancreatic radiotherapy [
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Tang X.
      • Jiang X.
      • et al.
      CT-based Multi-organ Segmentation using a 3D Self-attention U-Net Network for Pancreatic Radiotherapy.
      ]. The eight organs include large bowel, small bowel, duodenum, left kidney, right kidney, liver spinal cord and stomach. This CT-based method could be used with on-rail CT to facilitate fast contouring for adaptive radiotherapy. Besides the CT-based segmentation, CBCT-based multiorgan segmentation has also been proposed for prostate adaptive radiotherapy [
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Tian S.
      • Patel P.
      • Jani A.
      • et al.
      Pelvic Multi-organ Segmentation on CBCT for Prostate Adaptive Radiotherapy.
      ].
      DL-based methods [
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Tang X.
      • Jiang X.
      • et al.
      CT-based Multi-organ Segmentation using a 3D Self-attention U-Net Network for Pancreatic Radiotherapy.
      ,
      • Tappeiner E.
      • Proll S.
      • Honig M.
      • Raudaschl P.F.
      • Zaffino P.
      • Spadea M.F.
      • et al.
      Multi-organ segmentation of the head and neck area: an efficient hierarchical neural networks approach.
      ,
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and Neck Multi-Organ Auto-Segmentation on CT Images Aided by Synthetic MRI.
      ,
      • Hu P.
      • Wu F.
      • Peng J.
      • Bao Y.
      • Chen F.
      • Kong D.
      Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets.
      ,
      • Gibson E.
      • Giganti F.
      • Hu Y.
      • Bonmati E.
      • Bandula S.
      • Gurusamy K.
      • et al.
      Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks.
      ,
      • Chen S.
      • Zhong X.
      • Hu S.
      • Dorn S.
      • Kachelriess M.
      • Lell M.
      • et al.
      Automatic multi-organ segmentation in dual-energy CT (DECT) with dedicated 3D fully convolutional DECT networks.
      ] have achieved the-state-of-art performances in medical image segmentation, especially in multi-organ segmentation. In contrast to traditional methods that ultilize handcrafted features, DL-based methods adaptively explore representative features from medical images [

      Wang T, Lei Y, Fu Y, Curran W, Liu T, Yang X. Machine Learning in Quantitative PET Imaging. arXiv e-Print. 2020;arXiv:2001.06597.

      ]. In this paper, we reviewed deep learning-based methods for medical image segmentation with a focus on multi-organ segmentation. We classified the methods into two broad categories which are pixelwise classification and end-to-end segmentation. Each category was reviewed in details to study its latest developments, contributions and chellenges. We provided benchmark evaluations of recently published multi-organ segmentation methods for CT thoracic and Head and Neck (HN) segmentations.

      Deep learning in Multi-Organ segmentation

      DL-based multi-organ segmentation methods can be categorized by network architecture, training process (supervised, semi-supervised, unsupervised, transfer learning), input image types (patch-based, whole volume-based, 2D, 3D) and so on. In this paper, we first classify them into two broad categories which are pixelwise classification and end-to-end segmentation since the two category represent the major steps of development in DL-based image segmentation. Based on the network structure design, we further divided the pixelwise classification methods in to 1) Auto-Encoder (AE) and 2) Convolutional Neural Network (CNN). Similarly, we divided the end-to-end segmentaiton methods into 1) Fully Convolutional Network (FCN), 2) Region-based FCN (R-FCN), 3) Generative Adversarial Network (GAN) and 4) Synthetic Image-aided Segmentation. For each sub-category, we provided a comprehansive list of the surveyed works followed by a short discussion.
      Works cited in this review were collected from various databases, including Google Scholar, PubMed, Web of Science, Semantic Scholar and so on. Keywords used to search literature include but not limited to deep learning, multi-organ, medical image segmentation, convolutional neural network and so on. Over 60 papers that are closely related to multi-organ segmentation were collected. Most of these works were published between the year of 2017 and 2020. In this paper, we also included some single organ segmentation papers for the ease of description since many multi-organ segmentation methods were developed by replacing the last layer for multi-class classification. The number of multi-organs publications is plotted against year in Fig. 1.
      Figure thumbnail gr1
      Fig. 1The number of publications for DL-based multi-organ segmentation (till October 2020).
      Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95) and mean surface distance (MSD) were often used to evaluate the performance of the segmentation methods. The DSC is a measure of the volumetric overlap between the predicted and ground truth segmentation.
      DSC=2×|XY||X|+|Y|


      where X and Y are the predicted and ground truth segmentation, respectively.
      The HD95 and MSD measures the surface distances between the predicted and ground truth segmentations. The HD95 quantifies the maximum 95 percentile distance while the MSD quantifies the mean surface distance.
      HD95=maxk95%[dX,Y,d(Y,X)]


      MSD=1X+YxXdx,Y+yYdy,X


      where dx,Y=minyYx-y2. dX,Y is the total surface distance between X and Y.

      Pixel-wise classification

      Early DL-based methods performed image segmentation by repeatedly classifying the center pixels of sliding image patches that cover the whole image. Two major types of network for pixelwise classification are the AE and CNN. Fig. 2 shows the common network components for AE and CNN based methods.
      Figure thumbnail gr2
      Fig. 2The network components of the pixelwise classification methods.

      Auto-Encoder

      AE consists of a neural network encoder that encodes the input into a latent representation by minimizing the reconstruction errors between the input and the output. The output represents the restored input from the low-dimensional latent representation. By constraining the dimension of the latent space, AE can effectively compress the input into patterned latent space representation. To prevent the AE from learning an identity function, stacked AE (SAE) was proposed. SAE is constructed by stacking AEs on top of each other, where the output of each layer is wired to the inputs of its successive layers [
      • Shin H.C.
      • Orton M.R.
      • Collins D.J.
      • Doran S.J.
      • Leach M.O.
      Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data.
      ]. The benefit of SAE is that it can benefit from deeper network, which has higher level of feature representation [
      • Shin H.C.
      • Orton M.R.
      • Collins D.J.
      • Doran S.J.
      • Leach M.O.
      Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data.
      ]. Denoising autoencoder (DAEs) is another variant of the AE which prevent the model from learning a trivial solution by training the network to reconstruct a clean input from the corrupted input [
      • Alex V.
      • Vaidhya K.
      • Thirunavukkarasu S.
      • Kesavadas C.
      • Krishnamurthi G.
      Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.
      ]. Stacked denoising autoencoder (SDAE) is another type of AE that utilizes the power of DAE [

      Vaidhya K, Thirunavukkarasu S, Varghese A, Krishnamurthi G. Multi-modal Brain Tumor Segmentation Using Stacked Denoising Autoencoders. Brainles@MICCAI2015.

      ].

      Overview of works

      An overview of AE methods is shown in Table 1.Ahmad et al. proposed a deep SAE (DSAE) for CT liver segmentation [
      • Ahmad M.
      • Yang J.
      • Ai D.
      • Qadri S.F.
      • Wang Y.
      Deep-Stacked Auto Encoder for Liver Segmentation.
      ]. First, deep features were extracted from unlabeled data using the AE. Second, these features are fine-tuned to classify the liver among other abdominal organs. An average DSC of 0.9 was achieved on 659 2D liver images. Vaidhya et al. used SDAE to overcome the challenge of varying shape and texture of glioma tissue in MRI for this segmentation task [

      Vaidhya K, Thirunavukkarasu S, Varghese A, Krishnamurthi G. Multi-modal Brain Tumor Segmentation Using Stacked Denoising Autoencoders. Brainles@MICCAI2015.

      ]. 3D image patches were extracted from multiple sequences MRI and then were fed into the SDAE model to obtain the glioma segmentation. Two SDAE models were trained, one for high grade glioma (HGG) data, the other one for a combination of HGG and low-grade glioma (LGG) data. During testing, the segmentation was obtained by a combination of predictions from the two networks via maximum a posteriori (MAP) estimation. The network has achieved mean DSC of 0.82 ± 0.14 and 0.72 ± 0.21 for whole tumor segmentation on the HGG data and LGG data, respectively. Alex et al. applied SDAE for brain lesion detection, segmentation, and false-positive reduction [
      • Alex V.
      • Vaidhya K.
      • Thirunavukkarasu S.
      • Kesavadas C.
      • Krishnamurthi G.
      Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.
      ]. SDAE was pretrained using many unlabeled patient volumes and fine-tuned with 2D patches drawn from a limited number of patients. LGG segmentation was achieved using a transfer learning approach in which the pretrained SDAE network was fine-tuned using the LGG data. The method was able to achieve a mean DSC of 0.86 ± 0.12 for brain whole tumor segmentation on the BraTS challenge datasets.
      Table 1Overview of AE methods.
      Ref.YearNetworkSupervisionDimensionSiteModality
      • Shin H.C.
      • Orton M.R.
      • Collins D.J.
      • Doran S.J.
      • Leach M.O.
      Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data.
      2013SAEWeakly supervised3D patchAbdomen4D DCE-MRI

      Vaidhya K, Thirunavukkarasu S, Varghese A, Krishnamurthi G. Multi-modal Brain Tumor Segmentation Using Stacked Denoising Autoencoders. Brainles@MICCAI2015.

      2015SDAESupervised3D patchBrain GliomasMRI
      • Alex V.
      • Vaidhya K.
      • Thirunavukkarasu S.
      • Kesavadas C.
      • Krishnamurthi G.
      Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.
      2017SDAESemi-supervised2D patchBrain lesionMRI
      • Ahmad M.
      • Yang J.
      • Ai D.
      • Qadri S.F.
      • Wang Y.
      Deep-Stacked Auto Encoder for Liver Segmentation.
      2017SAETransfer learning2D sliceLiverCT
      • Wang C.
      • Elazab A.
      • Jia F.
      • Wu J.
      • Hu Q.
      Automated chest screening based on a hybrid model of transfer learning and convolutional sparse denoising autoencoder.
      2018CSDAETransfer learning2D sliceThoracicchest X-rays
      • Qadri S.F.
      • Zhao Z.
      • Ai D.
      • Ahmad M.
      • Wang Y.
      Vertebrae segmentation via stacked sparse autoencoder from computed tomography images.
      2019SSAEUnsupervised2D patchVertebraeCT
      • Wang X.
      • Zhai S.
      • Niu Y.
      Automatic Vertebrae Localization and Identification by Combining Deep SSAE Contextual Features and Structured Regression Forest.
      2019SSAEUnsupervised2D patchVertebraeCT
      • Tappeiner E.
      • Proll S.
      • Honig M.
      • Raudaschl P.F.
      • Zaffino P.
      • Spadea M.F.
      • et al.
      Multi-organ segmentation of the head and neck area: an efficient hierarchical neural networks approach.
      2019Hierarchical 3D AESupervisedN.A.Head & NeckCT
      Accurate vertebrae segmentation in the spine is essential for spine assessment, surgical planning and diagnosis. Qadri et al. proposed a stacked SAE (SSAE) model for the segmentation of vertebrae from CT images [
      • Qadri S.F.
      • Zhao Z.
      • Ai D.
      • Ahmad M.
      • Wang Y.
      Vertebrae segmentation via stacked sparse autoencoder from computed tomography images.
      ]. High-level features were extracted from the 2D image patches using the SSAE model, which was trained in an unsupervised way. To improve the network performance, the authors fine-tuned the network using supervised training. The SSAE model was validated on the 2014 MICCAI CSI challenge datasets with an average DSC of 0.86.

      Discussion

      SDAE has been shown to be working for the segmentation of brain MRI tumor on public BraTS 2013 and BraTS 2015 data [
      • Alex V.
      • Vaidhya K.
      • Thirunavukkarasu S.
      • Kesavadas C.
      • Krishnamurthi G.
      Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.
      ,
      • Menze B.H.
      • Jakab A.
      • Bauer S.
      • Kalpathy-Cramer J.
      • Farahani K.
      • Kirby J.
      • et al.
      The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS).
      ]. DSAE has been shown to have high classification accuracy and speed for liver segmentation on CT images [
      • Ahmad M.
      • Yang J.
      • Ai D.
      • Qadri S.F.
      • Wang Y.
      Deep-Stacked Auto Encoder for Liver Segmentation.
      ]. AE can learn medical image deep contextual features from large-range input samples to improve their contextual discrimination ability [
      • Wang X.
      • Zhai S.
      • Niu Y.
      Automatic Vertebrae Localization and Identification by Combining Deep SSAE Contextual Features and Structured Regression Forest.
      ]. Validated on the 98 spine CT scans from the public MICCAI CSI 2014 dataset, the SSAE method could effectively and automatically locate and identify spinal targets in CT scans, and achieve high localization accuracy without making any assumptions about visual field in CT scans [
      • Qadri S.F.
      • Zhao Z.
      • Ai D.
      • Ahmad M.
      • Wang Y.
      Vertebrae segmentation via stacked sparse autoencoder from computed tomography images.
      ].
      Although AE has many advantages, it faces some challenges and limitations in medical multi-organ segmentations. One of the limitations is related to data regularity. AE-based segmentation methods work quite well for anatomical structures that have small shape variability such as lung, heart and liver. However, it remains challenging for the unsupervised AE methods to segment irregular lesions and tumors that have large shape variability. The number of layers used in AE could be limited due to large computation complexity. Unlike CNN which uses convolution kernels with shared learnable parameters, AE methods cannot be easily extended to large number of layers which limits its learning ability.

      Convolutional neural networks

      A typical CNN consists of convolutional layers, activation functions, max pooling layers, batch normalization layers, dropout layers and fully connected layers. The last layer of a CNN is typically a sigmoid or softmax layer for classification and tanh layer for regression. The convolution layers can learn to extract various feature maps depending on the task. Pooling layers are used to reduce the spatial size of the feature maps using maximum/average down-sampling operations. Activation functions such as Rectified linear unit (ReLU) and Leaky ReLU are used to simulate neuron activation by clipping any negative input values to zero and passing positive input values to the connected neurons [
      • He K.M.
      • Zhang X.Y.
      • Ren S.Q.
      • Sun J.
      Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
      ]. Fully connected layer connects every neuron in previous layer to every neuron in next layer. They are placed before the final classification layer to flatten the feature maps. The final classification layers are used to predict the possibility of the center image pixel of belonging to one of the classes.
      During the training, gradient based optimization methods such as stochastic gradient descent (SGD) and Adam gradient descent are commonly used to update the learnable parameters of the CNN architecture through back-propagation. Cross-entropy is one of the most widely used loss functions. LeNet was first proposed by Lecun et al. to classify hand-written digits [
      • Lecun Y.
      • Bottou L.
      • Bengio Y.
      • Haffner P.
      Gradient-based learning applied to document recognition.
      ]. LeNet is composed of convolution layers, pooling layers and fully connected layers. As computers get more powerful and more data are available for network training, Krizhevsky et al. proposed AlexNet in 2012 and won the ILSVRC-2012 image classification competition [
      • Russakovsky O.
      • Deng J.
      • Su H.
      • Krause J.
      • Satheesh S.
      • Ma S.
      • et al.
      ImageNet Large Scale Visual Recognition Challenge.
      ] by a large margin [
      • Krizhevsky A.
      • Sutskever I.
      • Hinton G.E.
      ImageNet Classification with Deep Convolutional Neural Networks.
      ]. Since the introduction of AlexNet, CNNs started to gain widespread attention, which has led to the development of various types of CNNs that achieved the-state-of-art performances in many image processing tasks. The improvements of AlexNet over LeNet include 1) ReLU layer for nonlinearity and sparsity, 2) data augmentation to enlarge the dataset variety, 3) dropout layer to reduce learnable parameters and prevent overfitting, 4) GPU for parallel computing, 5) local response normalization and 6) overlapping pooling. In 2014, Zeiler and Fergus proposed ZFNet to improve the performance of AlexNet [
      • Zeiler M.D.
      • Fergus R.
      Visualizing and Understanding Convolutional Networks.
      ] and showed that shallow network can learn edge, color and texture features of images and deep network can learn abstract features of images. They demonstrated that better performance can be achieved via deeper network. The main improvement of ZFNet is deconvolution network used to visualize the feature map. To evaluate the network performance with respect to network depths, VGG was proposed to extend the network depth to 19 layers by Simonyan and Zisserman [
      • Simonyan K.
      • Zisserman A.
      Very Deep Convolutional Networks for Large-Scale Image Recognition.
      ]. GoogLeNet was proposed to introduce the inception module [

      Szegedy C, Wei L, Yangqing J, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2015. p. 1-9.

      ]. The inception module allows broader perception field and deeper network which improves the network’s ability of feature extraction. As a result, GoogLeNet won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). As the network gets deeper, training of the network gets harder due to gradient vanishing/exploding. To alleviate the problem, He et al. proposed a residual network (ResNet) which allows even deeper network to be trained for image recognition [

      He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2016. p. 770-8.

      ]. Huang et al. later proposed a densely connected convolutional network (DenseNet) by connecting each layer to every other layers [

      Huang G, Liu Z, Maaten Lvd, Weinberger KQ. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2017. p. 2261-9.

      ] in order to combine both low-frequency and high-frequency feature maps.

      Overview of works

      The surveyed CNN methods are listed in Table 2. Roth et al. proposed a multi-level deep CNN approach for abdominal CT image pancreas segmentation [

      Roth HR, Lu L, Farag A, Shin HC, Liu JM, Turkbey EB, et al. DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation. Medical Image Computing and Computer-Assisted Intervention - Miccai 2015, Pt I. 2015;9349:556-64.

      ]. A dense local image patches and labels were obtained by extracting 2D image patches in the axial, coronal and sagittal plane using a sliding window. The proposed CNN learns to assign class probabilities to the center voxels of the image patches. The proposed CNN architecture consists of five convolutional layers followed by max-pooling layers, three fully connected layers, two dropout layers and a soft-max operator to perform binary classification. Evaluated on 82 patient’s CT images using 4-fold cross-validation, an average DSC of 0.84 ± 0.06 and 0.72 ± 0.11 was obtained for the training and testing, respectively. For volumetric datasets, it is beneficial to explore the 3D images directly rather than 2D images. Therefore, Hamidian et al. proposed to use 3D patch-based CNN to detect lung pulmonary nodules for chest CT images [
      • Hamidian S.
      • Sahiner B.
      • Petrick N.
      • Pezeshk A.
      3D Convolutional Neural Network for Automatic Detection of Lung Nodules in Chest CT. Proc SPIE Int Soc.
      ]. Volumes of interest image patches were extracted from the 3D lung image database consortium (LIDC) dataset [
      • Armato Iii S.G.
      • McLennan G.
      • Bidaut L.
      • McNitt-Gray M.F.
      • Meyer C.R.
      • Reeves A.P.
      • et al.
      The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans.
      ]. They demonstrated that 3D CNN is more suitable for volumetric CT data than 2D CNN.
      Table 2Overview of CNN methods.
      Ref.YearNetworkDimensionSiteModality
      • Men K.
      • Chen X.Y.
      • Zhang Y.
      • Zhang T.
      • Dai J.R.
      • Yi J.L.
      • et al.
      Deep Deconvolutional Neural Network for Target Segmentation of Nasopharyngeal Cancer in Planning Computed Tomography Images. Frontiers.
      2017Deep deconvolutional neural network (DDNN)2D sliceBrainCT
      • Kamnitsas K.
      • Ledig C.
      • Newcombe V.F.J.
      • Simpson J.P.
      • Kane A.D.
      • Menon D.K.
      • et al.
      Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.
      20173D CNN3D patchBrain lesionMRI

      Roth HR, Lu L, Farag A, Shin HC, Liu JM, Turkbey EB, et al. DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation. Medical Image Computing and Computer-Assisted Intervention - Miccai 2015, Pt I. 2015;9349:556-64.

      2015Multi-level DCNN2D patchPancreasCT
      • Roth H.R.
      • Lu L.
      • Farag A.
      • Sohn A.
      • Summers R.M.
      Spatial Aggregation of Holistically-Nested Networks for Automated Pancreas Segmentation.
      2016Holistically Nested CNN2D patchPancreasCT
      • Hamidian S.
      • Sahiner B.
      • Petrick N.
      • Pezeshk A.
      3D Convolutional Neural Network for Automatic Detection of Lung Nodules in Chest CT. Proc SPIE Int Soc.
      20173D CNN3D patchChestCT
      • Hu P.
      • Wu F.
      • Peng J.
      • Bao Y.
      • Chen F.
      • Kong D.
      Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets.
      20173D DCNNNot specifiedAbdomenCT
      • Ibragimov B.
      • Xing L.
      Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks.
      2017CNN3D patchHead & NeckCT

      K. Jd R G, A M. Fuzzy-C-Means Clustering Based Segmentation and CNN-Classification for Accurate Segmentation of Lung Nodules Asian Pac J Cancer Prev. 18 2017 1869 1874.

      2017Fuzzy-C-Means CNN3D patchLung noduleCT
      • Zhou X.R.
      • Takayama R.
      • Wang S.
      • Zhou X.X.
      • Hara T.
      • Fujita H.
      Automated segmentation of 3D anatomical structures on CT images by using a deep convolutional network based on end-to-end learning approach.
      2017DCNN2D SliceBody, Chest, AbdomenCT
      • Bae H.J.
      • Kim C.W.
      • Kim N.
      • Park B.
      • Kim N.
      • Seo J.B.
      • et al.
      A Perlin Noise-Based Augmentation Strategy for Deep Learning with Small Data Samples of HRCT Images.
      2018Fusion Net2D patch100 ROIsHRCT
      • Chmelik J.
      • Jakubicek R.
      • Walek P.
      • Jan J.
      • Ourednicek P.
      • Lambert L.
      • et al.
      Deep convolutional neural network-based segmentation and classification of difficult to define metastatic spinal lesions in 3D CT data.
      2018DCNN2D patchSpinal lesionCT
      • Gudmundsson E.
      • Straus C.M.
      • Armato 3rd., S.G.
      Deep convolutional neural networks for the automated segmentation of malignant pleural mesothelioma on computed tomography scans.
      2018DCNN2D sliceMalignant pleural mesotheliomaCT
      • Nardelli P.
      • Jimenez-Carretero D.
      • Bermejo-Pelaez D.
      • Washko G.R.
      • Rahaghi F.N.
      • Ledesma-Carbayo M.J.
      • et al.
      Pulmonary Artery-Vein Classification in CT Images Using Deep Learning.
      20182D and 3D CNN2D slice, 3D volumeArtery / veinCT
      • Thyreau B.
      • Sato K.
      • Fukuda H.
      • Taki Y.
      Segmentation of the hippocampus by transferring algorithmic knowledge for large cohort processing.
      20183D ConvNets3D volumeBrainMRI
      • Wang G.T.
      • Li W.Q.
      • Zuluaga M.A.
      • Pratt R.
      • Patel P.A.
      • Aertsen M.
      • et al.
      Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning.
      2018CNN with specific fine-tuning2D slice, 3D volumeBrain, abdomenFetal MRI
      • Zhou X.R.
      • Yamada K.
      • Kojima T.
      • Takayama R.
      • Wang S.
      • Zhou X.X.
      • et al.
      Performance evaluation of 2D and 3D deep learning approaches for automatic segmentation of multiple organs on CT images.
      20182D and 3D DCNN2D slice, 3D volumeWhole bodyCT
      • Liu H.
      • Wang L.
      • Nan Y.
      • Jin F.
      • Wang Q.
      • Pu J.
      SDFN: Segmentation-based deep fusion network for thoracic disease classification in chest X-ray images.
      2019Deep fusion Network2D sliceChestCXR
      • Tang Y.C.
      • Huo Y.K.
      • Xiong Y.X.
      • Moon H.
      • Assad A.
      • Moyo T.K.
      • et al.
      Improving Splenomegaly Segmentation by Learning from Heterogeneous Multi-Source Labels.
      2019DCNN2D sliceAbdomenCT
      • Yun J.
      • Park J.
      • Yu D.
      • Yi J.
      • Lee M.
      • Park H.J.
      • et al.
      Improvement of fully automated airway segmentation on volumetric computed tomographic images using a 2.5 dimensional convolutional neural net.
      20192.5D CNN2.5D patchThoraxCT
      • Zhong T.
      • Huang X.
      • Tang F.
      • Liang S.J.
      • Deng X.G.
      • Zhang Y.
      Boosting-based cascaded convolutional neural networks for the segmentation of CT organs-at-risk in nasopharyngeal carcinoma.
      2019Cascaded CNN2D sliceHead & NeckCT

      Harten LDv, Noothout JMH, Verhoeff J, Wolterink JM, Išgum I. Automatic Segmentation of Organs at Risk in Thoracic CT scans by Combining 2D and 3D Convolutional Neural Networks. SegTHOR@ISBI2019.

      20192D and 3D CNN2D slice, 3D volumeThoraxCT
      • Zhu J.
      • Zhang J.
      • Qiu B.
      • Liu Y.
      • Liu X.
      • Chen L.
      Comparison of the automatic segmentation of multiple organs at risk in CT images of lung cancer between deep convolutional neural network-based and atlas-based techniques.
      2019U-Net Neural Network3D patchLungCT
      In radiotherapy, it is common to segment multiple organs near the tumor for treatment planning. For nasopharyngeal carcinoma (NPC), it is very challenging to automatically segment the surrounding adhesion tissues of the parotids, thyroids and optic nerves due to low soft tissue contrast of the CT images. To overcome this challenge, Zhong et al. proposed a cascaded CNN network to delineate these three organs for NPC radiotherapy using a boosting algorithm which includes three cascaded CNNs [
      • Zhong T.
      • Huang X.
      • Tang F.
      • Liang S.J.
      • Deng X.G.
      • Zhang Y.
      Boosting-based cascaded convolutional neural networks for the segmentation of CT organs-at-risk in nasopharyngeal carcinoma.
      ]. The first network was trained with the traditional approach. The second one was trained on patterns (pixels) filtered by the first network. Finally, the third network was trained on the new patterns (pixels) that were jointly extracted by the first and second networks. The outputs of the three nets were combined to obtain the final output. 2D patch-based ResNet [

      He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2016. p. 770-8.

      ] was used to build the cascaded CNNs. CT images of 140 NPC patients treated with radiotherapy were collected. Manual contours of the three organs were used as learning targets. The mean DSC values were above 0.92 for the parotids, above 0.92 for the thyroids, and above 0.89 for the optic nerves. For thoracic radiotherapy treatment, Harten et al. proposed a combination of 2D and 3D CNNs for automatic segmentation of organs including esophagus, heart, trachea, and aorta on simulation CT scans of patients diagnosed with lung, breast or esophageal cancer [

      Harten LDv, Noothout JMH, Verhoeff J, Wolterink JM, Išgum I. Automatic Segmentation of Organs at Risk in Thoracic CT scans by Combining 2D and 3D Convolutional Neural Networks. SegTHOR@ISBI2019.

      ]. The 3D patch-based network contains a deep segment of residual blocks [

      Johnson JM, Alahi A, Fei-Fei L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV2016.

      ] with sigmoid layer to perform multi-class binary classification. The 2D patch-based (2D patch extracted from axial, coronal and sagittal planes) network contains dilated convolutions [
      • Wolterink J.M.
      • Leiner T.
      • Viergever M.A.
      • Išgum I.
      Dilated Convolutional Neural Networks for Cardiovascular MR Segmentation in Congenital Heart Disease.
      ] with softmax layer to perform classification. 40 data were used for training and 20 data were used for testing.

      Discussion

      In the study of [

      Harten LDv, Noothout JMH, Verhoeff J, Wolterink JM, Išgum I. Automatic Segmentation of Organs at Risk in Thoracic CT scans by Combining 2D and 3D Convolutional Neural Networks. SegTHOR@ISBI2019.

      ], researchers evaluated the performance of 2D CNN, 3D CNN and a combination of the two and demonstrated that the combined network produced the best results. The DSC of the esophagus, heart, trachea, and aorta were 0.84 ± 0.05, 0.94 ± 0.02, 0.91 ± 0.02, and 0.93 ± 0.01, respectively. These results demonstrated potential for automating segmentation of OARs in routine radiotherapy treatment planning. A major drawback of the pixelwise classification methods is that classification need to be performed for every pixel repeatedly. This approach is inefficient since it requires repeated forward network prediction on every voxel of the image. To make the segmentation more efficient, Kamnissas et al. proposed a dense-inference technique that predicts the segmentation on a smaller patch rather than only the center pixel [
      • Kamnitsas K.
      • Ledig C.
      • Newcombe V.F.J.
      • Simpson J.P.
      • Kane A.D.
      • Menon D.K.
      • et al.
      Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.
      ]. However, this method is still relatively inefficient as compared to end-to-end segmentation that utilized the transposed convolution kernel to directly predict a segmentation map that is in the same size as the input image.

      End-to-end segmentation

      To increase the classification efficiency, end-to-end segmentation networks were proposed. The three categories of the end-to-end segmentation methods are shown in Fig. 3.
      Figure thumbnail gr3
      Fig. 3The network components of the end-to-end segmentation methods.

      FCN methods

      For pixelwise classification-based methods, the center voxel of the input image is classified by fully connected layers based on the flattened feature maps that are passed down through multiple convolutional layers. Shelhamer et al. first proposed a CNN that replaces the fully connected layer by a convolutional layer. Since all layers in the network are convolutional layers, the new network is named as fully convolutional network (FCN). Thanks to the deconvolution kernels that were used to up-sample the feature maps, FCN allows the model to predict a dense segmentation map that has the same size as the input image, which was referred to ‘end-to-end segmentation’ [
      • Shelhamer E.
      • Long J.
      • Darrell T.
      Fully Convolutional Networks for Semantic Segmentation.
      ]. By using FCN, the segmentation of whole image can be achieved in just one forward network inference.
      U-Net is one of the most well-known FCN structures for medical image segmentation that utilizes the concept of deconvolution and skip connection [
      • Ronneberger O.
      • Fischer P.
      • Brox T.
      U-Net: Convolutional Networks for Biomedical Image Segmentation.
      ]. As a variant of the FCN, the U-Net is a 19 layer -deep network that includes an encoding path and a decoding path. To preserve the spatial high-resolution information, the U-Net used long skip connections between the layers of equal resolution in the encoding path and decoding path. Milletari et al. proposed an variant of U-Net, called V-Net [
      • Milletari F.
      • Navab N.
      • Ahmadi S.-A.
      V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.
      ]. Unlike U-Net, V-Net involves residual block as short skip connection between early and later convolutional layers. This architecture improves convergence rate as compared to non-residual learning network, such as U-Net. To cope with class imbalance problem, V-Net used Dice loss instead of binary cross entropy loss.
      Deep supervision is commonly used to train the FCN. The main idea of deep supervision [
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      ,
      • Wang B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Liu Y.
      • Patel P.
      • et al.
      Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation.
      ] is to provide supervision over not only the final output layer but also the intermediate hidden layers. The direct supervision was extended to multiple deep layers, which could enhance the network’s discriminative ability. Attention gate was used in FCN to improve performance in image classification and segmentation [
      • Schlemper J.
      • Oktay O.
      • Schaap M.
      • Heinrich M.
      • Kainz B.
      • Glocker B.
      • et al.
      Attention gated networks: Learning to leverage salient regions in medical images.
      ] by highlighting salient features and suppressing irrelevant features for a specific task.

      Overview of works

      A list of FCN methods is shown in Table 3. Zhou et al. proposed a 2.5D FCN segmentation method to automatically segment 19 organs in CT images of whole body [
      • Zhou X.R.
      • Takayama R.
      • Wang S.
      • Hara T.
      • Fujita H.
      Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method.
      ]. In this work, 2.5D image patches, which consists of several consecutive axial slices, were used as multi-channel input for the FCN. Individual FCNs were also trained for the coronal and sagittal views, resulting in a total of three FCNs. Final segmentation was obtained from the three networks. Transrectal ultrasound (TRUS) is commonly used in image-guided prostate cancer interventions (e.g., biopsy and brachytherapy). Accurate segmentation of the prostate is very important for biopsy needle placement, brachytherapy treatment planning, and motion management. However, the prostate segmentation of TRUS image is challenging due to low image contrast and image noise. Lei et al. proposed a deeply supervised V-Net for accurate prostate segmentation [
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      ]. A deep supervision strategy with hybrid loss function (logistic and Dice loss) was used at different stages of the decoding path. To improve the segmentation accuracy at the prostate apex and base, a multi-directional contour refinement model was introduced to fuse transverse, sagittal and coronal plane-based segmentation. Tested on 44 patients’ TRUS images, this method has a mean DSC of 0.92 ± 0.03 for the prostate segmentation. Wang et al. proposed a 3D FCN with deep supervision and group dilated convolution to segment the prostate on MRI [
      • Wang B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Liu Y.
      • Patel P.
      • et al.
      Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation.
      ]. In this method, deep supervision mechanism was introduced into FCN to effectively alleviate the common gradient exploding and vanishing problems in training deep models. A group dilated convolution which aggregated multi-scale contextual information for dense prediction was proposed to enlarge the effective receptive field. A combined loss which included cosine and cross entropy was used to improve the segmentation accuracy. Tested on 40 patients’ T2 MR images, this method has a mean DSC of 0.86 ± 0.04 for the prostate segmentation.
      Table 3Overview of FCN methods.
      Ref.YearNetworkDimensionSiteModality
      • Ronneberger O.
      • Fischer P.
      • Brox T.
      U-Net: Convolutional Networks for Biomedical Image Segmentation.
      2015U-Net2D sliceNeuronal structureElectron microscopic

      Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. MICCAI2016.

      20163D U-Net3D volumeKidneyMicroscopic
      • Men K.
      • Dai J.R.
      • Li Y.X.
      Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks.
      2017Dilated FCN2D sliceAbdomenCT
      • Oda M.
      • Shimizu N.
      • Roth H.R.
      • Karasawa K.
      • Kitasaka T.
      • Misawa K.
      • et al.
      3D FCN Feature Driven Regression Forest-Based Pancreas Localization and Segmentation.
      20173D FCN Feature Driven Regression Forest3D patchPancreasCT
      • Zhou X.R.
      • Takayama R.
      • Wang S.
      • Hara T.
      • Fujita H.
      Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method.
      20172D FCN2.5D slicesWhole bodyCT
      • Brosch T.
      • Saalbach A.
      Foveal Fully Convolutional Nets for Multi-Organ Segmentation.
      2018Foveal Fully Convolutional NetsN.A.*Whole bodyCT
      • Chen L.
      • Bentley P.
      • Mori K.
      • Misawa K.
      • Fujiwara M.
      • Rueckert D.
      DRINet for Medical Image Segmentation.
      2018DRINet2D sliceBrain, abdomenCT
      • Gelder Ad.
      • Huisman H.J.A.
      Autoencoders for Multi-Label Prostate MR Segmentation.
      20183D U-Net3D volumeProstateMRI
      • Gibson E.
      • Giganti F.
      • Hu Y.
      • Bonmati E.
      • Bandula S.
      • Gurusamy K.
      • et al.
      Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks.
      2018Dense V-Net3D volumeAbdomenCT
      • Gibson E.
      • Li W.Q.
      • Sudre C.
      • Fidon L.
      • Shakir D.I.
      • Wang G.T.
      • et al.
      NiftyNet: a deep-learning platform for medical imaging.
      2018NiftyNet3D volumeAbdomenCT
      • Gonzalez G.
      • Washko G.R.
      • Estepar R.S.
      Multi-structure Segmentation from Partially Labeled Datasets. Application to Body Composition Measurements on CT Scans.
      2018PU-Net, CU-Net2D slicePelvisCT
      • Javaid U.
      • Dasnoy D.
      • Lee J.A.
      Multi-organ Segmentation of Chest CT Images in Radiation Oncology: Comparison of Standard and Dilated UNet.
      2018Dilated U-Net2D sliceChestCT

      Kakeya H, Okada T, Oshiro Y. 3D U-JAPA-Net: Mixture of Convolutional Networks for Abdominal Multi-organ CT Segmentation. Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt Iv. 2018;11073:426-33.

      20183D U-JAPA-Net3D volumeAbdomenCT
      • Kazemifar S.
      • Balagopal A.
      • Nguyen D.
      • McGuire S.
      • Hannan R.
      • Jiang S.
      • et al.
      Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning.
      2018U-Net2D slicePelvisCT

      Roth HR, Shen C, Oda H, Sugino T, Oda M, Hayashi Y, et al. A Multi-scale Pyramid of 3D Fully Convolutional Networks for Abdominal Multi-organ Segmentation. Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt Iv. 2018;11073:417-25.

      2018Multi-scale Pyramid of 3D FCN3D patchAbdomenCT
      • Tong N.
      • Gou S.
      • Yang S.
      • Ruan D.
      • Sheng K.
      Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks.
      2018Shape representation model constrained FCN3D volumeHead & NeckCT

      Zhou SH, Nie D, Adeli E, Gao YZ, Wang L, Yin JP, et al. Fine-Grained Segmentation Using Hierarchical Dilated Neural Networks. Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt Iv. 2018;11073:488-96.

      2018Hierarchical Dilated Neural Networks2D slicePelvisCT
      • Fu Y.B.
      • Mazur T.R.
      • Wu X.
      • Liu S.
      • Chang X.
      • Lu Y.G.
      • et al.
      A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
      2018Dense 3D FCN3D volumeAbdomenMRI

      Vandewinckele L, Willems S, Robben D, Veen JVD, Crijns W, Nuyts S, et al. Segmentation of head-and-neck organs-at-risk in longitudinal CT scans combining deformable registrations and convolutional neural networks. Computer methods in biomechanics and biomedical engineering Imaging & visualization. 2019:1-10.

      20183D FCN3D patchHead & NeckCT
      • Anthimopoulos M.
      • Christodoulidis S.
      • Ebner L.
      • Geiser T.
      • Christe A.
      • Mougiakakou S.
      Semantic Segmentation of Pathological Lung Tissue With Dilated Fully Convolutional Networks.
      2019Dilated FCN2D sliceLungCT
      • Binder T.
      • Tantaoui E.
      • Pati P.
      • Catena R.
      • Set-Aghayan A.
      • Gabrani M.
      Multi-Organ Gland Segmentation Using Deep Learning.
      2019Dense-U-Net2D sliceHead & NeckStained colon adenocarcinoma dataset
      • Chen G.
      • Zhang J.
      • Zhuo D.
      • Pan Y.
      • Pang C.
      Identification of pulmonary nodules via CT images with hierarchical fully convolutional networks.
      20192D and 3D FCNs2D slice and 3D volumePulmonary noduleCT
      • Chen S.
      • Zhong X.
      • Hu S.
      • Dorn S.
      • Kachelriess M.
      • Lell M.
      • et al.
      Automatic multi-organ segmentation in dual-energy CT (DECT) with dedicated 3D fully convolutional DECT networks.
      2019Dedicated 3D FCN3D patchThorax/abdomenDECT
      • Elguindi S.
      • Zelefsky M.J.
      • Jiang J.
      • Veeraraghavan H.
      • Deasy J.O.
      • Hunt M.A.
      • et al.
      Deep learning-based auto-segmentation of targets and organs-at-risk for magnetic resonance imaging only planning of prostate radiotherapy.
      20192D FCN (DeepLabV3 + )2D slicePelvisMRI
      • Gu X.
      • Wang J.
      • Zhao J.
      • Li Q.
      Segmentation and suppression of pulmonary vessels in low-dose chest CT scans.
      20192D FCN2D patchPulmonary vesselsCT
      • Li X.L.
      • Wang Y.Y.
      • Tang Q.S.
      • Fan Z.
      • Yu J.H.
      Dual U-Net for the Segmentation of Overlapping Glioma Nuclei.
      2019Dual U-Net2D sliceGlioma NucleiHematoxylin and eosin (H&E)-stained histopathological image
      • Nguyen N.Q.
      • Lee S.W.
      Robust Boundary Segmentation in Medical Images Using a Consecutive Deep Encoder-Decoder Network.
      2019Consecutive deep encoder-decoder Network2D sliceSkin lesionCT
      • Park B.
      • Park H.
      • Lee S.M.
      • Seo J.B.
      • Kim N.
      Lung Segmentation on HRCT and Volumetric CT for Diffuse Interstitial Lung Disease Using Deep Convolutional Neural Networks.
      2019U-Net2D sliceLungHRCT
      • Park J.
      • Yun J.
      • Kim N.
      • Park B.
      • Cho Y.
      • Park H.J.
      • et al.
      Fully Automated Lung Lobe Segmentation in Volumetric Chest CT with 3D U-Net: Validation with Intra- and Extra-Datasets.
      20193D U-Net3D volumeChestCT
      • van der Heyden B.
      • Wohlfahrt P.
      • Eekers D.B.P.
      • Richter C.
      • Terhaag K.
      • Troost E.G.C.
      • et al.
      Dual-energy CT for automatic organs-at-risk segmentation in brain-tumor patients using a multi-atlas and deep-learning approach.
      20193D U-Net with Multi-atlas3D volumeBrain tumorDual-energy CT
      • Xu X.A.N.
      • Zhou F.G.
      • Liu B.
      • Bai X.Z.
      Multiple Organ Localization in CT Image Using Triple-Branch Fully Convolutional Networks.
      2019Triple-Branch FCNNot specifiedAbdomen/torsoCT
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      20192.5D Deeply supervised V-Net2.5 patchProstateUltrasound
      • Wang B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Liu Y.
      • Patel P.
      • et al.
      Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation.
      2019Group dilated deeply supervised FCN3D volumeProstateMRI
      • Wang T.
      • Lei Y.
      • Tian S.
      • Jiang X.
      • Zhou J.
      • Liu T.
      • et al.
      Learning-based automatic segmentation of arteriovenous malformations on contrast CT images in brain stereotactic radiosurgery.
      20193D FCN3D volumeArteriovenous malformationsContract-enhanced CT

      Wang T, Lei Y, Tang H, He Z, Castillo R, Wang C, et al. A learning-based automatic segmentation and quantification method on left ventricle in gated myocardial perfusion SPECT imaging: A feasibility study. J Nucl Cardiol. 2019;In press, doi: 10.1007/s12350-019-01594-2.

      20193D FCN3D volumeLeft ventricleSPECT
      • Wu J.
      • Xin J.
      • Yang X.
      • Sun J.
      • Xu D.
      • Zheng N.
      • et al.
      Deep morphology aided diagnosis network for segmentation of carotid artery vessel wall and diagnosis of carotid atherosclerosis on black-blood vessel wall MRI.
      2019DeepMAD2.5D patchVessel wallMRI
      • van Rooij W.
      • Dahele M.
      • Brandao H.R.
      • Delaney A.R.
      • Slotman B.J.
      • Verbakel W.F.
      Deep Learning-Based Delineation of Head and Neck Organs at Risk: Geometric and Dosimetric Evaluation.
      20193D U-Net3D volumeHead & NeckCT
      • Heinrich M.P.
      • Oktay O.
      • Bouteldja N.
      OBELISK-Net: Fewer layers to solve 3D multi-organ segmentation with sparse deformable convolutions.
      2019OBELISK-Net3D volumeAbdomenCT
      • Wang Y.
      • Zhou Y.
      • Shen W.
      • Park S.
      • Fishman E.
      • Yuille A.
      Abdominal multi-organ segmentation with organ-attention networks and statistical fusion.
      2019OAN-RC2D sliceAbdomenCT
      • Chan J.
      • Kearney V.
      • Haaf S.
      • Wu S.
      • Bogdanov M.
      • Reddick M.
      • et al.
      A convolutional neural network algorithm for automatic segmentation of head and neck organs at risk using deep lifelong learning.
      2019Multi-stage 3D FCN3D volumeHead & NeckCT
      • Zhou Y.
      • Li Z.
      • Bai S.
      • Wang C.
      • Chen X.
      • Han M.
      • et al.
      Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation.
      20192D/3D FCN3D patchAbdomenCT
      • Kim H.
      • Jung J.
      • Kim J.
      • Cho B.
      • Kwak J.
      • Jang J.Y.
      • et al.
      Abdominal multi-organ auto-segmentation using 3D-patch-based deep convolutional neural network.
      2020U-Net3D patchAbdomenCT
      • Fu Y.
      • Ippolito J.
      • Ludwig D.R.
      • Nizamuddin R.
      • Li H.
      • Yang D.
      Automatic segmentation of CT images for ventral body composition analysis.
      20202.5D U-Net2.5D patchBodyCT
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Tang X.
      • Jiang X.
      • et al.
      CT-based Multi-organ Segmentation using a 3D Self-attention U-Net Network for Pancreatic Radiotherapy.
      20203D Attention U-Net3D patchPancreas/AbdomenCT
      • Peng Z.
      • Fang X.
      • Yan P.
      • Shan H.
      • Liu T.
      • Pei X.
      • et al.
      A Method of Rapid Quantification of Patient-Specific Organ Doses for CT Using Deep-Learning based Multi-Organ Segmentation and GPU-accelerated Monte Carlo Dose Computing.
      20203D U-Net3D patchThoracic/AbdomenCT
      • Gou S-p
      • Tong N.
      • Qi S.
      • Yang S.
      • Chin R.
      • Sheng K.
      Self-channel-and-spatial-attention neural network for automated multi-organ segmentation on head and neck CT images.
      20203D U-Net3D volumeHead & NeckCT
      Glands segmentation is essential in cancer diagnosis. However, accurate automated DL-based segmentation of glands is challenging due to the large variability in glandular morphology across tissues and pathological subtypes. Many accurate gland annotations are required for network training. Binder et al. investigated cross-domain (-organ type) approximation to reduce the need of organ-specific annotations [
      • Binder T.
      • Tantaoui E.
      • Pati P.
      • Catena R.
      • Set-Aghayan A.
      • Gabrani M.
      Multi-Organ Gland Segmentation Using Deep Learning.
      ]. Two proposed Dense-U-Nets are trained on Hematoxylin and Eosin (H&E) strained colon adenocarcinoma samples focusing on the gland and stroma segmentation. Unlike U-Net, Dense-U-Nets use asymmetric encoder and decoder. The encoder is designed to automatically and adaptively learn the spatial hierarchies of features from low to high level patterns coded within the image. The decoder uses transition layer (convolution with stride size 2) and dense convolution blocks consecutively to extract the encoded feature representation. The dense-convolution blocks from DenseNet [

      Huang G, Liu Z, Maaten Lvd, Weinberger KQ. Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2017. p. 2261-9.

      ] are used to strengthen feature propagation, encourage feature reuse and substantially reduce the total number of required parameters in the network. The decoder is composed of deconvolution layers and convolution blocks. The skip connection between the encoder and the decoder allows for feature reuse. The architecture has two decoders, one to predict the relevant gland locations, and a second to predicts the gland contours. Thus, the decoders output a gland probability map and a contour probability map. The network is supervised to predict both the gland locations and the gland contours. The model trained via Gland-approach achieves DSC of 0.92 and 0.78 on the colon and breast test datasets, respectively.

      Discussion

      The FCN usually has a fixed receptive size which struggles to detect the target once its size changes. One solution is multi-scale networks, where input images are resized first before feeding to the network. Multi-scale techniques can alleviate the problem caused by fixed receptive size in the FCN [

      Roth HR, Shen C, Oda H, Sugino T, Oda M, Hayashi Y, et al. A Multi-scale Pyramid of 3D Fully Convolutional Networks for Abdominal Multi-organ Segmentation. Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt Iv. 2018;11073:417-25.

      ]. However, sharing the parameters of the same network on a resized image may not be very effective as the object of different scales requires different parameters to process. Another solution is to perform multiple predictions on a sliding window across the entire image if the reception size is smaller than the image to be segmented [
      • Hamidian S.
      • Sahiner B.
      • Petrick N.
      • Pezeshk A.
      3D Convolutional Neural Network for Automatic Detection of Lung Nodules in Chest CT. Proc SPIE Int Soc.
      ].
      Multiscale FCN-based segmentation [
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      ,
      • Wang B.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Liu Y.
      • Patel P.
      • et al.
      Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation.
      ] can achieve good performance at the cost of higher computation complexity than the U-Net and V-Net methods. The problem of using 2.5D patch images as input is that the segmented contours of the axial, coronal and sagittal plane may not perfectly align. Though the FCN methods segment the object in an end-to-end fashion, each voxel is classified independently. For example, the pixel-wise cascaded CNN [
      • Zhong T.
      • Huang X.
      • Tang F.
      • Liang S.J.
      • Deng X.G.
      • Zhang Y.
      Boosting-based cascaded convolutional neural networks for the segmentation of CT organs-at-risk in nasopharyngeal carcinoma.
      ] outperformed U-Net [
      • Ronneberger O.
      • Fischer P.
      • Brox T.
      U-Net: Convolutional Networks for Biomedical Image Segmentation.
      ] in segmenting three OARs. This could be due to that the U-Net lacks spatial relationship modeling among the voxels, resulting in unreasonable object shapes. Therefore, post-processing such as conditional random field and graph cut are often adopted to refine the results [
      • Fu Y.B.
      • Mazur T.R.
      • Wu X.
      • Liu S.
      • Chang X.
      • Lu Y.G.
      • et al.
      A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
      ].

      Region-based FCN (R-FCN)

      Multi-organ segmentation is more challenging than single object segmentation in that multi-class classification is more difficult than binary classification. To improve the segmentation accuracy, multi-organ segmentation can be divided into two-steps: 1) to localize the targets of interest and 2) to perform binary classification for every target separately. We call this type of methods region-based FCN.
      Cascaded FCN is one type of R-FCN that stacks two FCN where the first FCN is used to locate the targets of interest and the second FCN is used to perform binary classification for each target. The cascaded FCN could help to alleviate the class imbalance problem in 3D FCN by the first FCN that locates the target and balances the foreground and background.
      Another type of R-FCN takes a different approach to locate the ROIs. Regional proposal networks are integrated to the FCN [

      Girshick RB, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. IEEE Conference on Computer Vision Pattern Recognition. 2013:580-7.

      ]. A selective search [
      • Uijlings J.R.R.
      • Kvd Sande
      • Gevers T.
      • Smeulders A.W.M.
      Selective Search for Object Recognition.
      ] method was used to extract many candidate regions from image. The locations of the region proposals were represented by multi-scale bounding boxes. After training, the region proposal network can predict the offsets and scales of the bounding boxes to refine the its locations and sizes to better encompass the targets of interest. However, the large number of regional proposals makes the network computationally demanding. To expedite the region detection process, Fast R-CNN [

      Girshick RB. Fast R-CNN. IEEE International Conference on Computer Vision. 2015 1440 1448.

      ] was proposed. The Fast R-CNN used a backbone network to identify the regional proposals, which were then processed using ROI pooling layers. Unlike R-CNN, the Fast R-CNN does not need to feed many region proposals to the network for each feeding image. Instead, the convolution operation is performed only once per image in Fast R-CNN. Both R-CNN and Fast R-CNN use selective search to identify the region proposals, which can be time-consuming. To make the algorithm faster, Ren et al. proposed Faster R-CNN [
      • Ren S.
      • He K.
      • Girshick R.B.
      • Sun J.
      Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
      ] to replace the selective search with learnable network proposals. A separate network was used to predict the region proposals which were then reshaped using ROI pooling layers for bounding box refinement and classification. The Fast and Faster R-CNN were proposed for only quick object detection and localization. To perform segmentation on the detected bounding box at the same time, He et al. proposed Mask R-CNN which integrated two more convolution layers to perform semantic segmentation within the bounding box [

      K. He G. Gkioxari P. Dollár Girshick RB. Mask R-CNN. IEEE International Conference on Computer Vision. 2017 2980 2988.

      ]. One major contribution of Mask R-CNN is the introduction of ROI align, which makes the target feature maps to have consistent size for better image segmentation.

      Overview of works

      The region-based FCN methods are shown in Table 4. Christ et al. performed liver lesions segmentation using cascaded FCNs, where the first FCN detects the liver location, and the second FCN extracts features from the detected ROI to obtain the liver lesions segmentation [

      Christ PF, Elshaer MEA, Ettlinger F, Tatavarty S, Bickel M, Bilic P, et al. Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields. MICCAI2016.

      ]. A Dice of 0.823 was achieved for lesion segmentation in CT images and 0.85 in MRI images. Similarly, Wu et al. investigated the cascaded FCN to improve the performance for fetal boundary detection in ultrasound images [

      L. Wu Y. Xin S. Li T. Wang P. Heng D. Ni Cascaded Fully Convolutional Networks for automatic prenatal ultrasound image segmentation 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017)2017. 663 666.

      ]. Their results have shown better performance compared to other boundary refinement techniques. The cascaded FCN usually outperforms the single FCN since separate sets of filters can be learned and applied to each ROI. Trullo et al. proposed two collaborate FCNs to jointly segment multi-organ in thoracic CT image, one was used for organ localization and the other one was used to segment the organ within that ROI [
      • Trullo R.
      • Petitjean C.
      • Nie D.
      • Shen D.G.
      • Ruan S.
      Joint Segmentation of Multiple Thoracic Organs in CT Images with Two Collaborative Deep Architectures.
      ]. The drawbacks of cascaded FCN are that the performance of the second FCN largely depends on the performance of the first FCN to accurately localize the target of interest. Due to the two-step process, the cascaded FCN usually takes longer to train and segment.
      Table 4Overview of Region-based FCN methods.
      Ref.YearNetworkDimensionSiteModality

      Christ PF, Elshaer MEA, Ettlinger F, Tatavarty S, Bickel M, Bilic P, et al. Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields. MICCAI2016.

      2016Cascaded FCN3D volumeLiver and lesionCT
      • Chen S.
      • Roth H.
      • Dorn S.
      • May M.
      • Cavallaro A.
      • Lell M.
      • et al.
      Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network.
      20173D Cascaded U-Net3D volumeAbdomenDECT
      • Roth H.R.
      • Oda H.
      • Zhou X.R.
      • Shimizu N.
      • Yang Y.
      • Hayashi Y.
      • et al.
      An application of cascaded 3D fully convolutional networks for medical image segmentation.
      2018Cascade 3D FCN3D patchAbdomenCT

      M. Liu J. Dong X. Dong H. Yu L. Qi Technology. Segmentation of Lung Nodule in CT Images Based on Mask R-CNN 9th International Conference on Awareness Science 2018 1 6.

      2018Mask R-CNN2D sliceLung noduleCT
      • He K.
      • Cao X.
      • Shi Y.
      • Nie D.
      • Gao Y.
      • Shen D.
      Pelvic Organ Segmentation Using Distinctive Curve Guided Fully Convolutional Networks.
      20193D FCN3D patchPelvic organsCT
      • Xu Z.
      • Wu Z.
      • Feng J.C.F.U.N.
      Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation.
      2018Combination of Faster R-CNN and U-Net (CFUN)3D volumeCardiacCT
      • Bouget D.
      • Jorgensen A.
      • Kiss G.
      • Leira H.O.
      • Lango T.
      Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging.
      2019Combination of U-Net and Mask R-CNN2D sliceChestCT
      • Huang X.
      • Sun W.
      • Tseng T.B.
      • Li C.
      • Qian W.
      Fast and fully-automated detection and segmentation of pulmonary nodules in thoracic CT scans using deep convolutional neural networks.
      2019Faster R-CNN2D sliceThorax/pulmonary noduleCT
      • Kopelowitz E.
      • Engelhard G.
      Lung Nodules Detection and Segmentation Using 3D Mask-RCNN.
      20193D Mask R-CNN3D volumeLung noduleCT
      • Li Y.
      • Zhang L.
      • Chen H.
      • Yang N.
      Lung Nodule Detection With Deep Learning in 3D Thoracic MR Images.
      20193D Faster R-CNN3D volumeThorax/ lung noduleMRI
      • Wessel J.
      • Heinrich M.P.
      • Jv Berg
      • Franz A.
      • Saalbach A.
      Sequential Rib Labeling and Segmentation in Chest X-Ray using Mask R-CNN.
      2019Mask R-CNNN.A.*ChestX-Ray
      • Xu X.
      • Zhou F.
      • Liu B.
      • Fu D.
      • Bai X.
      Efficient Multiple Organ Localization in CT Image using 3D Region Proposal Network.
      20193D RPN3D volumeWhole bodyCT
      • Zhang R.
      • Cheng C.
      • Zhao X.
      • Li X.
      Multiscale Mask R-CNN-Based Lung Tumor Detection Using PET Imaging.
      2019Multiscale Mask R-CNN2D sliceLung tumorPET
      • Wang S.
      • He K.
      • Nie D.
      • Zhou S.
      • Gao Y.
      • Shen D.
      CT male pelvic organ segmentation using fully convolutional networks with boundary sensitive representation.
      20192.5D U-Net3D patchPelvic organsCT
      • Zhang L.
      • Zhang J.
      • Shen P.
      • Zhu G.
      • Li P.
      • Lu X.
      • et al.
      Block Level Skip Connections Across Cascaded V-Net for Multi-Organ Segmentation.
      20203D Dense V-Net3D volumeThorax/AbdomenCT
      • Liang S.
      • Thung K.-H.
      • Nie D.
      • Zhang Y.
      • Shen D.
      Multi-View Spatial Aggregation Framework for Joint Localization and Segmentation of Organs at Risk in Head and Neck CT Images.
      20202.5D CNN2.5DHead & NeckCT
      *N.A.: not available, i.e. not explicitly indicated in the publication
      Xu et al. proposed an efficient detection method for multi-organ localization in CT image using 3D regional proposal network (RPN) [
      • Xu X.
      • Zhou F.
      • Liu B.
      • Fu D.
      • Bai X.
      Efficient Multiple Organ Localization in CT Image using 3D Region Proposal Network.
      ]. Since the proposed RPN is implemented in a 3D manner, it can take advantage of the spatial context information in CT image. AlexNet was used to build the backbone network architecture that can generate high-resolution feature maps to further improve the localization performance of small organs. The method was evaluated on abdomen and brain site datasets and achieved high detection precision and localization accuracy with fast inference speed. Xu et al. proposed a novel heart segmentation framework, called CFUN, which combined Faster R-CNN and U-Net [
      • Xu Z.
      • Wu Z.
      • Feng J.C.F.U.N.
      Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation.
      ]. The CFUN can detect and segment the whole heart with good results at reduced computational cost. CFUN introduces a new loss function based on edge information named 3D Edge-loss to accelerate the network training and improve the segmentation results. The proposed CFUN takes less than 15 s to segment the heart with an average DSC of 0.86 on the MM-WHS2017 challenge datasets. Similarly, Bouget et al. proposed a combination of Mask R-CNN and U-Net for the segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging [
      • Bouget D.
      • Jorgensen A.
      • Kiss G.
      • Leira H.O.
      • Lango T.
      Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging.
      ]. Li et al. proposed a lung nodule detection method based on Faster R-CNN for thoracic MRI in a transfer learning manner [
      • Li Y.
      • Zhang L.
      • Chen H.
      • Yang N.
      Lung Nodule Detection With Deep Learning in 3D Thoracic MR Images.
      ]. A false positive (FP) reduction scheme based on anatomical characteristics is designed to reduce the FPs and preserve the true nodule. Similarly, Faster R-CNN was also used for pulmonary nodule detection on CT image [
      • Huang X.
      • Sun W.
      • Tseng T.B.
      • Li C.
      • Qian W.
      Fast and fully-automated detection and segmentation of pulmonary nodules in thoracic CT scans using deep convolutional neural networks.
      ].

      Discussion

      The Region-based FCN methods are useful tools for multi-organ segmentation and detection tasks. One drawback of the cascaded FCN is that it is a two-step process which may slow down the segmentation. The segmentation accuracy also largely depends on the accuracy of the region localization. The introduction of region proposal networks such as Faster R-CNN has made progress toward fast bounding box detection and localization. However, small and low contrast targets such as nodules and esophagus may be missed by the region proposal network, resulting in erroneous segmentation. Due to the higher data dimensionality and larger number of weight parameters, training 3D R-FCN based models is more time-consuming than the 2D version. However, significant advantages such as higher localization and segmentation accuracy still encourage us to handle this problem in 3D. To speed up the training procedure of the proposed method, one potential solution is to apply batch normalization after each convolutional layer in the backbone network to improve the model convergence, and conduct most calculations on GPU in parallel [
      • Xu X.
      • Zhou F.
      • Liu B.
      • Fu D.
      • Bai X.
      Efficient Multiple Organ Localization in CT Image using 3D Region Proposal Network.
      ].

      Gan

      GAN has gained a lot of attention in image processing due to its data generation capability without explicitly modelling the probability density function. A typical GAN consists of two competing networks, a generator and a discriminator [

      Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Nets. NIPS2014.

      ]. The generator is trained to generate artificial data that approximate the target data distribution. The discriminator is trained to distinguish the artificial data from the true data. The discriminator encourages the generator to predict realistic data by penalizing unrealistic predictions. The adversarial loss could be considered as a trainable network-based loss term. The two networks compete in a zero-sum game [
      • Yi X.
      • Walia E.
      • Babyn P.
      Generative Adversarial Network in Medical Imaging: A Review.
      ]. GAN has been shown to be useful in many applications, such as image reconstruction [

      Ying X, Guo H, Ma K, Wu JY, Weng Z, Zheng Y. X2CT-GAN: Reconstructing CT from Biplanar X-Rays with Generative Adversarial Networks. CVPR2019.

      ], image enhancement [

      Dong X, Lei Y, Wang T, Higgins K, Liu T, Curran WJ, et al. Deep learning-based attenuation correction in the absence of structural information for whole-body PET imaging. Phys Med Biol. 2019;in press, doi: 10.1088/1361-6560/ab652c.

      ,
      • Harms J.
      • Lei Y.
      • Wang T.
      • Zhang R.
      • Zhou J.
      • Tang X.
      • et al.
      Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography.
      ], segmentation [
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ,

      Dai W, Dong N, Wang Z, Liang X, Zhang H, Xing EP. SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-Rays. DLMIA/ML-CDS@MICCAI2017.

      ], classification and detection [

      Zhang Q, Wang H, Lu H, Won D, Yoon SW. Medical Image Synthesis with Generative Adversarial Networks for Tissue Recognition. 2018 IEEE International Conference on Healthcare Informatics (ICHI)2018. p. 199-207.

      ], augmentation [
      • Han C.
      • Murao K.
      • Si Satoh
      • Nakayama H.
      Learning More with Less: GAN-based Medical Image Augmentation.
      ], and cross-modality image synthesis [
      • Lei Y.
      • Harms J.
      • Wang T.
      • Liu Y.
      • Shu H.K.
      • Jani A.B.
      • et al.
      MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks.
      ].

      Overview of works

      An overview of GAN methods is shown in Table 5. As discussed above in the FCN-based methods, one challenge in medical image segmentation is that these methods may have boundary leakage in low contrast regions. Using adversarial loss introduced via a discriminator can take into account high order potentials to solve this problem [

      Yang D, Xu D, Zhou SK, Georgescu B, Chen M, Grbic S, et al. Automatic Liver Segmentation Using an Adversarial Image-to-Image Network. MICCAI2017.

      ]. The adversarial loss can be regarded as a learned similarity measurement between the segmented contours and the annotated ground truth (manual contours) for medical image segmentation tasks. Instead of only measuring the voxel classification loss such as Dice loss and cross entropy loss, the discriminator in GAN can map the segmented and ground truth masks to a latent space and measure the global similarity. Logistic loss between the latent space features of the segmented and ground truth masks can be used to measure the shape similarity. The idea is analogous to the perceptual loss that is widely used in natural images processing.
      Table 5Overview of GAN methods.
      Ref.YearNetworkDimensionSiteModality

      Dai W, Dong N, Wang Z, Liang X, Zhang H, Xing EP. SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-Rays. DLMIA/ML-CDS@MICCAI2017.

      2015SCAN2D sliceChestX-rays
      • Kamnitsas K.
      • Baumgartner C.
      • Ledig C.
      • Newcombe V.
      • Simpson J.
      • Kane A.
      • et al.
      Unsupervised Domain Adaptation in Brain Lesion Segmentation with Adversarial Networks.
      2017Multi-connected adversarial networks2D sliceBrainMulti-modality MRI
      • Moeskops P.
      • Veta M.
      • Lafarge M.W.
      • Eppenhof K.A.J.
      • Pluim J.P.W.
      Adversarial Training and Dilated Convolutions for Brain MRI Segmentation.
      2017Dilated GAN2D sliceBrainMRI
      • Rezaei M.
      • Harmuth K.
      • Gierke W.
      • Kellermeier T.
      • Fischer M.
      • Yang H.
      • et al.
      A Conditional Adversarial Network for Semantic Segmentation of Brain Tumor.
      2017Conditional GAN2D sliceBrain tumorMRI
      • Son J.
      • Park S.J.
      • Jung K.-H.
      Retinal Vessel Segmentation in Fundoscopic Images with Generative Adversarial Networks.
      2017GAN2D patchRetinal VesselFundoscopic

      Yang D, Xu D, Zhou SK, Georgescu B, Chen M, Grbic S, et al. Automatic Liver Segmentation Using an Adversarial Image-to-Image Network. MICCAI2017.

      2017Adversarial Image-to-Image Network3D volumeLiverCT
      • Zhu W.
      • Xiang X.
      • Tran T.D.
      • Hager G.D.
      • Xie X.
      Adversarial deep structured nets for mass segmentation from mammograms. IEEE 15th International Symposium on Biomedical.
      2017Adversarial FCN-CRF Nets2D sliceMassMammograms
      • Li Z.
      • Wang Y.
      • Yu J.
      Brain Tumor Segmentation Using an Adversarial Network.
      2018GANNot specifiedBrain tumorMRI
      • Mondal A.K.
      • Dolz J.
      • Desrosiers C.
      Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning.
      2018Few-shot GAN3D patchBrainMRI
      • Rezaei M.
      • Yang H.
      • Meinel C.
      Whole Heart and Great Vessel Segmentation with Context-aware of Generative Adversarial Networks.
      2018Context-aware GAN2D cropped slicesCardiacMRI
      • Rezaei M.
      • Yang H.
      • Meinel C.
      Conditional Generative Refinement Adversarial Networks for Unbalanced Medical Image Semantic Segmentation.
      2018Conditional Generative Refinement Adversarial Networks2D sliceBrainMRI
      • Xue Y.
      • Xu T.
      • Zhang H.
      • Long L.R.
      • Huang X.
      SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation.
      2018SegAN2D sliceBrainMRI
      • Zhang L.
      • Pereañez M.
      • Piechnik S.K.
      • Neubauer S.
      • Petersen S.E.
      • Frangi A.F.
      Multi-Input and Dataset-Invariant Adversarial Learning (MDAL) for Left and Right-Ventricular Coverage Estimation in Cardiac MRI.
      2018MDAL2D sliceLeft and Right-VentricularCardiac MRI

      Zhang Y, Miao S, Mansi T, Liao R. Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation. Medical Image Computing and Computer Assisted Intervention - Miccai 2018, Pt Ii. 2018;11071:599-607.

      2018TD-GAN2D sliceWhole bodyX-ray
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      2019U-Net-GAN3D volumeThoraxCT
      • Mahmood F.
      • Borders D.
      • Chen R.
      • McKay G.N.
      • Salimian K.J.
      • Baras A.
      • et al.
      Deep Adversarial Training for Multi-Organ Nuclei Segmentation in Histopathology Images.
      2019Conditional GAN2D sliceNucleiHistopathology Images
      • Trullo R.
      • Petitjean C.
      • Dubray B.
      • Ruan S.
      Multiorgan segmentation using distance-aware adversarial networks. Journal of Medical.
      2019Distance-aware GAN2D sliceChestCT
      • Tong N.
      • Gou S.
      • Yang S.
      • Cao M.
      • Sheng K.
      Shape constrained fully convolutional DenseNet with adversarial training for multiorgan segmentation on head and neck CT and low-field MR images.
      2019Shape Constraint GAN3D volumeHead & NeckCT/MRI

      Cai J, Xia Y, Yang D, Xu D, Yang L, Roth H. End-to-End Adversarial Shape Learning for Abdomen Organ Deep Segmentation. MLMI@MICCAI2019.

      2019Shape Constraint GAN3D volumeAbdomenCT

      Bnouni N, Rekik I, Rhim MS, Amara NB. Context-Aware Synergetic Multiplex Network for Multi-organ Segmentation of Cervical Cancer MRI. PRIME@MICCAI2020.

      2020CycleGAN2D slicePelvic organsMRI
      Dai et al. proposed a structure correcting adversarial network (SCAN) to segment the lung and the heart in Chest X-Ray (CXR) images [

      Dai W, Dong N, Wang Z, Liang X, Zhang H, Xing EP. SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-Rays. DLMIA/ML-CDS@MICCAI2017.

      ]. SCAN used an FCN to generate the binary mask of the segmented organs and a critic network to judge whether the segmented structure is reasonable from a physiological perspective. The critic network was trained to discriminate between the ground truth organ annotations from the segmented masks generated by the network. The critic network helps to regularize the appearance of the segmentation result to achieve realistic segmentation outcomes.
      GAN can be used to alleviate the problem of training data shortage. It is common that only limited datasets are available for network training since it is very time-consuming and laborious to manually generate large datasets especially for multi-organ segmentation. To overcome this challenge, Mondal et al. proposed a GAN-based method that is capable of learning from a few labeled images. The network was used to perform 3D multimodal brain MRI segmentation from a few-shot learning perspective [
      • Mondal A.K.
      • Dolz J.
      • Desrosiers C.
      Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning.
      ]. The proposed adversarial network encouraged the segmentation to have a similar distribution of outputs for images with and without annotations, thereby improving the generalization ability of the network.
      Dong et al. proposed a conditional GAN to train deep neural networks for the segmentation of multiple organs on thoracic CT images [
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ]. The proposed U-Net-generative-adversarial-network (U-Net-GAN) utilized a set of U-Nets as generators and a fully convolutional networks (FCNs) as discriminators. Specifically, the U-Nets were trained to produce image segmentation map of multiple organs. The discriminator, structured as FCN, discriminated between the ground truth and segmented organs produced by the generator. The generator and discriminator competed against each other in an adversarial learning process to improve the segmentation results of multiple organs. For multi-organ segmentation, training a universal network to segment all targets usually result in reduced segmentation accuracy. GAN method in [
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ] grouped OARs of similar dimensions, and utilized three sub-networks for segmentation, one for lungs and heart, and the other two for esophagus and spinal cord, respectively. This approach improved segmentation accuracy at the cost of computation efficiency. This segmentation technique was applied to delineate the left and right lungs, spinal cord, esophagus, and heart using 35 patients' chest CTs. The averaged DSC for the above five OARs are 0.97, 0.97, 0.90, 0.75, and 0.87, respectively.

      Discussion

      GAN can improve the segmentation accuracy by training a discriminator to generate an adversarial loss. However, GAN based network can be difficult to train since the generator and discriminator needs to be trained simultaneously to reach Nash equilibrium. Binary classifications of the results as fake or true provide stepped and unsmooth gradient, which make it difficult to train the discriminator. To alleviate this problem, Wasserstein GAN [

      Arjovsky M, Chintala S, Bottou L. Wasserstein Generative Adversarial Networks. ICML2017.

      ] was proposed to use Earth-Mover distance based metrics to replace the binary classification to improve the gradient back propagation during discriminator training. The training stage of GAN can be difficult and time-consuming. However, once trained, the GAN uses only the generator to perform segmentation. Using adversarial loss as a shape regulator can benefit more when the learning target (organ) has a regular distinctive shape, e.g., for lung and heart, but can be less useful for small tubular objects, such as vessels and catheters.

      Synthetic image-aided segmentation

      Multi-modal images could be used to improve the segmentation accuracy since each imaging modality has its own advantages and disadvantages. For example, CT images have high bony structure definition but low soft-tissue contrast. MR images have high soft-tissue contrast but lower image spatial resolution. Therefore, it is beneficial to take multi-modal images for segmentation. However, multi-modal images are not always available for the images to be segmented. Even if other modality images exist, they need to be co-registered at first in order to be content-consistent. As an alternative, cross-modality image synthesis has been used to aid the segmentation process [
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Tian S.
      • Patel P.
      • Jani A.
      • et al.
      Pelvic Multi-organ Segmentation on CBCT for Prostate Adaptive Radiotherapy.
      ,
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and Neck Multi-Organ Auto-Segmentation on CT Images Aided by Synthetic MRI.
      ,
      • Dai X.
      • Lei Y.
      • Wang T.
      • Dhabaan A.
      • McDonald M.
      • Beitler J.
      • et al.
      Synthetic MRI-aided Head-and-Neck Organs-at-Risk Auto-Delineation for CBCT-guided Adaptive Radiotherapy.
      ].

      Overview of works

      The surveyed synthetic image-aided segmentation methods are listed in Table 6. Accurate segmentation of the pelvic OARs on CT image for treatment planning is challenging due to the poor soft-tissue contrast [
      • Yang X.
      • Lei Y.
      • Wang T.
      • Patel P.R.
      • Jiang X.
      • Liu T.
      • et al.
      MRI-Based Synthetic CT for Radiation Treatment of Prostate Cancer.
      ,
      • Lei Y.
      • Wang T.
      • Harms J.
      • Shafai-Erfani G.
      • Tian S.
      • Higgins K.
      • et al.
      MRI-based pseudo CT generation using classification and regression random forest. SPIE Medical.
      ]. MRI has been used to aid CT prostate delineation, but it is not as accessible as CT for radiation therapy [
      • Lei Y.
      • Harms J.
      • Wang T.
      • Tian S.
      • Zhou J.
      • Shu H.K.
      • et al.
      MRI-based synthetic CT generation using semantic random forest with iterative refinement.
      ,
      • Lei Y.
      • Jeong J.J.
      • Wang T.
      • Shu H.K.
      • Patel P.
      • Tian S.
      • et al.
      MRI-based pseudo CT synthesis using anatomical signature and alternating random forest with iterative refinement model.
      ]. Lei et al. developed a deep attention-based segmentation strategy to segment CT pelvic organs with the help of synthetic MRI (sMRI), which were generated by cycle generative adversarial network (CycleGAN) [
      • Lei Y.
      • Harms J.
      • Wang T.
      • Liu Y.
      • Shu H.K.
      • Jani A.B.
      • et al.
      MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks.
      ,

      Lei Y, Dong X, Tian Z, Liu Y, Tian S, Wang T, et al. CT prostate segmentation based on synthetic MRI-aided deep attention fully convolution network. Med Phys. 2019;in press, doi: 10.1002/mp.13933.

      ]. This method includes two steps: first, a CycleGAN was used to estimate sMRI from CT images. Second, a deep attention FCN was trained based on the sMRI and manual contours deformed from MRIs. Attention models were introduced to focus on prostate boundary. Inspired by this method, Dong et al. developed a novel sMRI-aided segmentation method for mail pelvic CT multi-organ [
      • Dong X.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Patel P.
      • Curran W.J.
      • et al.
      Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network.
      ]. The DSC between the segmented bladder, prostate, and rectum manual contours were 0.95 ± 0.03, 0.87 ± 0.04 and 0.89 ± 0.04 respectively. Similarly, Lei et al. extended this method to multi-organ segmentation of cone-beam computed tomography (CBCT) pelvic data for potential CBCT-guided adaptive radiation therapy workflow [

      Lei Y, Wang T, Tian S, Dong X, Jani AB, Schuster D, et al. Male pelvic multi-organ segmentation aided by CBCT-based synthetic MRI. Phys Med Biol. 2019;in press, doi: 10.1088/1361-6560/ab63bb.

      ]. The DSC between the segmented and physicians' manual contours (bladder, prostate, and rectum) were 0.95 ± 0.02, 0.86 ± 0.06 and 0.91 ± 0.04, respectively.
      Table 6Overview of synthetic image-aided image segmentation.
      Ref.YearNetworkDimensionSiteModality
      • Dong X.
      • Lei Y.
      • Tian S.
      • Wang T.
      • Patel P.
      • Curran W.J.
      • et al.
      Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network.
      ,

      Lei Y, Dong X, Tian Z, Liu Y, Tian S, Wang T, et al. CT prostate segmentation based on synthetic MRI-aided deep attention fully convolution network. Med Phys. 2019;in press, doi: 10.1002/mp.13933.

      2019Synthetic MRI-aided2.5D patchPelvicCT

      Lei Y, Wang T, Tian S, Dong X, Jani AB, Schuster D, et al. Male pelvic multi-organ segmentation aided by CBCT-based synthetic MRI. Phys Med Biol. 2019;in press, doi: 10.1088/1361-6560/ab63bb.

      ,
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Tian S.
      • Patel P.
      • Jani A.
      • et al.
      Pelvic Multi-organ Segmentation on CBCT for Prostate Adaptive Radiotherapy.
      2019Synthetic MRI-aided3D volumePelvicCBCT
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and Neck Multi-Organ Auto-Segmentation on CT Images Aided by Synthetic MRI.
      ,
      • Dai X.
      • Lei Y.
      • Wang T.
      • Dhabaan A.
      • McDonald M.
      • Beitler J.
      • et al.
      Synthetic MRI-aided Head-and-Neck Organs-at-Risk Auto-Delineation for CBCT-guided Adaptive Radiotherapy.
      2020Synthetic MRI-aided3D volumeHead-and-NeckCT/CBCT
      Head-and-neck (HN) multi-organ segmentation is very challenging for radiotherapy because many vital and small structures exist in the area. CT/CBCT HN images suffer from low soft tissue contrast and image artifacts which make is difficult to accurately segment all the OARs. FCN methods often result in boundary leakage due to the low soft tissue contrast. To alleviate this problem and increase the automatic segmentation accuracy, Liu et al. proposed a synthetic MRI-aided CT image segmentation method using dual pyramid network [
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and Neck Multi-Organ Auto-Segmentation on CT Images Aided by Synthetic MRI.
      ]. A CycleGAN image synthesis network was first trained using manually registered MRI-CT image pairs. Synthesized MRI was generated based on the original CT images using the CycleGAN. The synthetic MRI and CT images were both taken as input images to train a dual pyramid network. The authors have achieved superior results as compared to the 2015 HN challenge results. Similarly, Dai et al. proposed a synthetic MRI-aided CBCT HN segmentation method for adaptive radiotherapy [
      • Dai X.
      • Lei Y.
      • Wang T.
      • Dhabaan A.
      • McDonald M.
      • Beitler J.
      • et al.
      Synthetic MRI-aided Head-and-Neck Organs-at-Risk Auto-Delineation for CBCT-guided Adaptive Radiotherapy.
      ].

      Discussion

      Compared to using only the CT/CBCT images, sMRI-aided image segmentation has higher segmentation accuracy. This is because the sMRI provides complementary information for the network to learn. For prostate, this improves the prostate segmentation accuracy and alleviate the issue of prostate volume overestimation when using CT images alone. For HN, sMRI improves the boundary definition of the small organ structures due to the sMRI’s superior soft tissue contrast. However, the image quality of the sMRI largely depends on the quality of the training MRI-CT image pairs, which are usually obtained using deformable image registration. Therefore, the performance of this method can be affected by the quality of image registration. It is very difficult to perform robust and accurate image registration of the abdomen especially when the MRI and CT images were acquired at different days.

      Benchmark

      Due to the different datasets used for evaluation, it is difficult for the readers to compare the accuracy of the surveyed methods. To facilitate direct comparison, we have summarized the performance of many surveyed papers that used the same benchmark datasets. Two benchmark datasets surveyed here are from the 2017 AAPM thoracic auto-segmentation challenge [
      • Yang J.
      • Veeraraghavan H.
      • Armato 3rd, S.G.
      • Farahani K.
      • Kirby J.S.
      • Kalpathy-Kramer J.
      • et al.
      Autosegmentation for thoracic radiation treatment planning: A grand challenge at AAPM 2017.
      ] and 2015 MICCAI Head and Neck Auto-segmentation Challenge [
      • Raudaschl P.F.
      • Zaffino P.
      • Sharp G.C.
      • Spadea M.F.
      • Chen A.
      • Dawant B.M.
      • et al.
      Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015.
      ].

      2017 AAPM thoracic Auto-segmentation challenge

      The 2017 AAPM Thoracic Auto-segmentation Challenge provided a benchmark dataset to evaluate the performance of automatic multi-organ segmentation methods for thoracic CT images. The OARs included left and right lungs, heart, esophagus, and spinal cord. Sixty thoracic CT scans provided by three institutions were separated into 36 training, 12 offline testing, and 12 online testing scans. Clinical contours used for treatment planning were quality checked and edited to adhere to the RTOG 1106 contouring guidelines. From the report of this challenge, there are 7 participants who completed the online challenge. Five out of the 7 participants used DL-based methods. In addition to the participating methods, other algorithms that used the datasets were also listed in Table 7.
      Table 7DL-based methods using the 2017 AAPM Thoracic Auto-segmentation Challenge datasets.
      MetricMethodEsophagusHeartLeft LungRight LungSpinal Cord
      DSCDCNNTeam Elekta*0.72 ± 0.100.93 ± 0.020.97 ± 0.020.97 ± 0.020.88 ± 0.037
      3D U-Net
      • Feng X.
      • Qing K.
      • Tustison N.J.
      • Meyer C.H.
      • Chen Q.
      Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3D images.
      0.72 ± 0.100.93 ± 0.020.97 ± 0.020.97 ± 0.020.89 ± 0.04
      Multi-class CNNTeam Mirada*0.71 ± 0.120.91 ± 0.020.98 ± 0.020.97 ± 0.020.87 ± 0.110
      2D ResNetTeam Beaumont*0.61 ± 0.110.92 ± 0.020.96 ± 0.030.95 ± 0.050.85 ± 0.035
      3D and 2D U-NetTeam WUSTL*0.55 ± 0.200.85 ± 0.040.95 ± 0.030.96 ± 0.020.83 ± 0.080
      U-Net-GAN
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      0.75 ± 0.080.87 ± 0.050.97 ± 0.010.97 ± 0.010.90 ± 0.04
      MSD(mm)DCNNTeam Elekta*2.23 ± 2.822.05 ± 0.620.74 ± 0.311.08 ± 0.540.73 ± 0.21
      3D U-Net
      • Feng X.
      • Qing K.
      • Tustison N.J.
      • Meyer C.H.
      • Chen Q.
      Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3D images.
      2.34 ± 2.382.30 ± 0.490.59 ± 0.290.93 ± 0.570.66 ± 0.25
      Multi-class CNNTeam Mirada*2.08 ± 1.942.98 ± 0.930.62 ± 0.350.91 ± 0.520.76 ± 0.60
      2D ResNetTeam Beaumont*2.48 ± 1.152.61 ± 0.692.90 ± 6.942.70 ± 4.841.03 ± 0.84
      3D and 2D U-NetTeam WUSTL*13.10 ± 10.394.55 ± 1.591.22 ± 0.611.13 ± 0.492.10 ± 2.49
      U-Net-GAN
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      1.05 ± 0.661.49 ± 0.850.61 ± 0.730.65 ± 0.530.38 ± 0.27
      HD95(mm)DCNNTeam Elekta*7.3 + 10.315.8 ± 1.982.9 ± 1.324.7 ± 2.502.0 ± 0.37
      3D U-Net
      • Feng X.
      • Qing K.
      • Tustison N.J.
      • Meyer C.H.
      • Chen Q.
      Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3D images.
      8.71 + 10.596.57 ± 1.502.10 ± 0.943.96 ± 2.851.89 ± 0.63
      Multi-class CNNTeam Mirada*7.8 ± 8.179.0 ± 4.292.3 ± 1.303.7 ± 2.082.0 ± 1.15
      2D ResNetTeam Beaumont*8.0 ± 3.808.8 ± 5.317.8 ± 19.1314.5 ± 34.42.3 ± 0.50
      3D and 2D U-NetTeam WUSTL*37.0 ± 26.8813.8 ± 5.494.4 ± 3.414.1 ± 2.118.10 ± 10.72
      U-Net-GAN
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      4.52 ± 3.814.58 ± 3.672.07 ± 1.932.50 ± 3.341.19 ± 0.46
      *Note: Participating methods of the AAPM thorax challenge
      • Yang J.
      • Veeraraghavan H.
      • Armato 3rd, S.G.
      • Farahani K.
      • Kirby J.S.
      • Kalpathy-Kramer J.
      • et al.
      Autosegmentation for thoracic radiation treatment planning: A grand challenge at AAPM 2017.
      .
      There are no significant differences in terms of DSC of lung, heart and spinal cord for the listed DL-based methods. In comparison, the DSC of esophagus has larger variations due to its low soft tissue contrast. The DSC metric could be biased since it tends to favor organs with large volume, such as the lung and heart. Similarly, it is important to note whether extensive post-processing was performed when interpreting the MSD and HD95 since the post-processing could significantly affect the values of surface agreement metrics. With no post-processing, the U-Net-GAN method shows better surface agreement with ground truth. This is due to the use of GAN which enforced additional regularization to the shape of the segmented organs.

      2015 MICCAI Head and Neck Auto-segmentation challenge

      The 2015 MICCAI Head and Neck Auto-segmentation Challenge [
      • Raudaschl P.F.
      • Zaffino P.
      • Sharp G.C.
      • Spadea M.F.
      • Chen A.
      • Dawant B.M.
      • et al.
      Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015.
      ] provided a benchmark dataset to evaluate the performance of automatic multi-organ segmentation methods for head & neck CT images. The OARs were brainstem, mandible, chiasm, bilateral optic nerves, bilateral parotid glands, and bilateral submandibular glands. The datasets included 40 images, out of which, 25 images were used as training data, 10 images were used for off-site testing, and 5 images were used for on-site testing. The datasets were chosen to ensure good image quality, complete OAR coverage and minimal tumor overlap with OARs without any age or gender requirements. The report of this challenge [
      • Raudaschl P.F.
      • Zaffino P.
      • Sharp G.C.
      • Spadea M.F.
      • Chen A.
      • Dawant B.M.
      • et al.
      Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015.
      ] did not include any DL-based method. We studied recent DL-based multi-organ segmentation methods that used this benchmark dataset. The performance of these methods in terms of DSC and HD95 were listed in Table 8.
      Table 8DL-based methods using the 2015 MICCAI Head and Neck Auto-segmentation Challenge datasets.
      MetricOrgansShape model constrained FCN
      • Raudaschl P.F.
      • Zaffino P.
      • Sharp G.C.
      • Spadea M.F.
      • Chen A.
      • Dawant B.M.
      • et al.
      Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015.
      Two-stage U-Net
      • Wang Y.
      • Zhao L.
      • Wang M.
      • Song Z.
      Organ at Risk Segmentation in Head and Neck CT Images Using a Two-Stage Segmentation Framework Based on 3D U-Net.
      AnatomyNet
      • Zhu W.
      • Huang Y.
      • Zeng L.
      • Chen X.
      • Liu Y.
      • Qian Z.
      • et al.
      AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy.
      DL-based

      Tang H, Chen X, Liu Y, Lu Z, You J, Yang M, et al. Clinically applicable deep learning framework for organs at risk delineation in CT images. 2019.

      Synthetic MRI-aided
      • Liu Y.
      • Lei Y.
      • Fu Y.
      • Wang T.
      • Zhou J.
      • Jiang X.
      • et al.
      Head and Neck Multi-Organ Auto-Segmentation on CT Images Aided by Synthetic MRI.
      3D U-Net

      Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. MICCAI2016.

      3D-CNN
      • Ren X.
      • Xiang L.
      • Nie D.
      • Shao Y.
      • Zhang H.
      • Shen D.
      • et al.
      Interleaved 3D-CNNs for joint segmentation of small-volume structures in head and neck CT images.
      DSCBrain Stem0.87 ± 0.030.88 ± 0.020.87 ± 0.020.87 ± 0.030.91 ± 0.020.80 ± 0.08N.A.
      Chiasm0.58 ± 0.10.45 ± 0.170.53 ± 0.150.62 ± 0.10.73 ± 0.11N.A.0.58 ± 0.17
      Mandible0.87 ± 0.030.93 ± 0.020.93 ± 0.020.95 ± 0.010.96 ± 0.010.94 ± 0.02N.A.
      Left Optic Nerve0.65 ± 0.050.74 ± 0.150.72 ± 0.060.75 ± 0.070.78 ± 0.090.72 ± 0.060.72 ± 0.08
      Right Optic Nerve0.69 ± 0.50.74 ± 0.090.71 ± 0.10.72 ± 0.060.78 ± 0.110.70 ± 0.070.70 ± 0.09
      Left Parotid0.84 ± 0.020.86 ± 0.020.88 ± 0.020.89 ± 0.020.88 ± 0.040.87 ± 0.03N.A.
      Right Parotid0.83 ± 0.020.85 ± 0.070.87 ± 0.040.88 ± 0.050.88 ± 0.060.85 ± 0.07N.A.
      Left Submandibular0.76 ± 0.060.76 ± 0.150.81 ± 0.040.82 ± 0.050.86 ± 0.080.76 ± 0.09N.A.
      Right Submandibular0.81 ± 0.060.73 ± 0.010.81 ± 0.040.82 ± 0.050.85 ± 0.100.78 ± 0.07N.A.
      HD95(mm)Brain Stem4.01 ± 0.932.01 ± 0.33N.A.N.A.N.A.N.A.N.A.
      Chiasm2.17 ± 1.042.83 ± 1.42N.A.N.A.N.A.N.A.2.81 ± 1.56
      Mandible1.50 ± 0.321.26 ± 0.50N.A.N.A.N.A.N.A.N.A.
      Left Optic Nerve2.52 ± 1.042.53 ± 2.34N.A.N.A.N.A.N.A.2.33 ± 0.84
      Right Optic Nerve2.90 ± 1.882.13 ± 2.45N.A.N.A.N.A.N.A.2.13 ± 0.96
      Left Parotid3.97 ± 2.152.41 ± 0.54N.A.N.A.N.A.N.A.N.A.
      Right Parotid4.20 ± 1.272.93 ± 1.48N.A.N.A.N.A.N.A.N.A.
      Left Submandibular5.59 ± 3.932.86 ± 1.60N.A.N.A.N.A.N.A.N.A.
      Right Submandibular4.84 ± 1.673.44 ± 1.55N.A.N.A.N.A.N.A.N.A.
      For HN organ segmentation, the synthetic MRI-aided method has near consistent improvement over other DL-based methods. This demonstrated the efficacy of synthetic MRI in image segmentation. Significant improvement has been achieved for the chiasm segmentation using synthetic MRI. However, as Liu et al. pointed out, synthetic MRI may not be as effective for some other organs such as the parotid, for which CT has good contrast. Therefore, the synthetic MRI-aided method has similar performance for the parotid segmentation as other methods. To train a synthetic image generation network, well-aligned CT-MRI image pairs are required for training. The unavailability of well-aligned image pairs poses additional challenge for synthetic image generation.

      Other considerations

      Data collected directly from clinical databases are usually not ready for network training. It is necessary to perform data preprocessing such as image resizing, image cropping, image normalization and data augmentation prior to network training. Other pre-processing techniques include registration [
      • Fu Y.
      • Lei Y.
      • Wang T.
      • Curran W.J.
      • Liu T.J.
      • Yang X.J.A.
      Deep Learning in Medical Image Registration: A Review.
      ], bias/scatter/attenuation correction [
      • Lei Y.
      • Tang X.
      • Higgins K.
      • Lin J.
      • Jeong J.
      • Liu T.
      • et al.
      Learning-based CBCT correction using alternating random forest based on auto-context model.
      ,
      • Yang X.
      • Wang T.
      • Lei Y.
      • Higgins K.
      • Liu T.
      • Shim H.
      • et al.
      MRI-based attenuation correction for brain PET/MRI based on anatomic signature and machine learning.
      ], voxel intensity normalization [
      • Zhou X.-Y.
      • Yang G.-Z.
      Normalization in Training U-Net for 2-D Biomedical Semantic Segmentation.
      ] and cropping [
      • Wang T.
      • Lei Y.
      • Shafai-Erfani G.
      • Jiang X.
      • Dong X.
      • Zhou J.
      • et al.
      Learning-based automatic segmentation on arteriovenous malformations from contract-enhanced CT images. SPIE Medical.
      ]. Data augmentation is used to increase the amount of training samples and reduce network over-fitting. Typical data augmentation techniques include image rotation, translation, scaling, flipping, distortion, linear warping, elastic deformation, and noise contamination. During training, the ground truth contours are obtained by manual delineation by physicians. Depending on the manual contours generation, it is likely that the DL-based method is biased toward physicians’ contouring style as a system error, and contouring uncertainty as a random error. This limitation is expected in all supervised learning-based methods.
      Depending on the network design and GPU availability, some methods use the whole image volume as input to train the network [

      Christ PF, Elshaer MEA, Ettlinger F, Tatavarty S, Bickel M, Bilic P, et al. Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields. MICCAI2016.

      ] whereas some methods uses 2D image slices [
      • Ronneberger O.
      • Fischer P.
      • Brox T.
      U-Net: Convolutional Networks for Biomedical Image Segmentation.
      ]. The 2D-based approaches can reduce the memory requirement and computational cost; however, it fails to utilize the spatial information in the third dimension. Another way to exploit the 3D feature information while avoid computer memory overflow is to use 2D kernels on multidirectional 2D images. Segmentation results from different image planes such as axial, coronal and sagittal planes can be combined using a surface-based contour refinement method [
      • Lei Y.
      • Tian S.
      • He X.
      • Wang T.
      • Wang B.
      • Patel P.
      • et al.
      Ultrasound prostate segmentation based on multidirectional deeply supervised V-Net.
      ]. 3D image patches are also widely used as network input [
      • Alex V.
      • Vaidhya K.
      • Thirunavukkarasu S.
      • Kesavadas C.
      • Krishnamurthi G.
      Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.
      ,