Advertisement

Artificial intelligence and machine learning for medical imaging: A technology review

      Highlights

      • Artificial intelligence (AI) has transformed the field of medical image analysis.
      • Gathering key knowledge about AI becomes a must for the medical community.
      • This review presents the basic technological pillars of AI for medical image analysis.
      • We also discuss how the state-of-the-art AI methods and the new trends in the field.

      Abstract

      Artificial intelligence (AI) has recently become a very popular buzzword, as a consequence of disruptive technical advances and impressive experimental results, notably in the field of image analysis and processing. In medicine, specialties where images are central, like radiology, pathology or oncology, have seized the opportunity and considerable efforts in research and development have been deployed to transfer the potential of AI to clinical applications. With AI becoming a more mainstream tool for typical medical imaging analysis tasks, such as diagnosis, segmentation, or classification, the key for a safe and efficient use of clinical AI applications relies, in part, on informed practitioners. The aim of this review is to present the basic technological pillars of AI, together with the state-of-the-art machine learning methods and their application to medical imaging. In addition, we discuss the new trends and future research directions. This will help the reader to understand how AI methods are now becoming an ubiquitous tool in any medical image analysis workflow and pave the way for the clinical implementation of AI-based solutions.

      Keywords

      Introduction

      For the last decade, the locution Artificial Intelligence (AI) has progressively flooded many scientific journals, including those of image processing and medical physics. Paradoxically, though, AI is an old concept, starting to be formalized in the 1940s, while the term of artificial intelligence itself was coined in 1956 by John McCarthy. In short, AI refers to computer algorithms that can mimic features that are characteristic of human intelligence, such as problem solving or learning. The latest success of AI has been made possible thanks to tremendous growths of both computational power and data availability. In particular, AI applications based on machine learning (ML) algorithms have experienced unprecedented breakthroughs during the last decade in the field of computer vision. The medical community has taken advantage of these extraordinary developments in order to build AI applications that get the most of medical images, automating different steps of the clinical practice or providing support for clinical decisions. Papers relying on AI and ML report promising results in a wide range of medical applications [
      • Singh R.
      • Wu W.
      • Wang G.
      • Kalra M.K.
      Artificial intelligence in image reconstruction: the change is here.
      ,
      • Wang M.
      • Zhang Q.
      • Lam S.
      • Cai J.
      • Yang R.
      A review on application of deep learning algorithms in external beam radiotherapy automated treatment planning.
      ,

      Wang C, Zhu X, Hong JC, Zheng D. Artificial intelligence in radiotherapy treatment planning: present and future. Technol Cancer Res Treat 2019;18:1533033819873922.

      ,
      • Litjens G.
      • Kooi T.
      • Bejnordi B.E.
      • Setio A.A.A.
      • Ciompi F.
      • Ghafoorian M.
      • et al.
      A survey on deep learning in medical image analysis.
      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Wynne J.F.
      • Curran W.J.
      • Liu T.
      • et al.
      A review on medical imaging synthesis using deep learning and its clinical applications.
      ,
      • Thompson R.F.
      • Valdes G.
      • Fuller C.D.
      • Carpenter C.M.
      • Morin O.
      • Aneja S.
      • et al.
      Artificial intelligence in radiation oncology: a specialty-wide disruptive transformation?.
      ,
      • Hosny A.
      • Parmar C.
      • Quackenbush J.
      • Schwartz L.H.
      • Aerts H.J.W.L.
      Artificial intelligence in radiology.
      ]. Disease diagnosis, image segmentation or outcome prediction are some of the tasks that are experiencing a disruptive transformation thanks to the latest progress of AI.
      More recently, ML tools have become mature enough to fulfill clinical requirements and, thus, research and clinical teams, as well as companies are working together to develop clinical AI solutions. Today, we are closer than ever to the clinical implementation of AI and, therefore, getting to know the basics of this technology becomes a “must” for every professional in the medical field. Helping the medical physics community to acquire such a solid background knowledge about AI and learning methods, including their evolution and current state of the art, will certainly result in higher quality research, facilitate the first steps of new researchers in this field, and inspire novel research directions.
      The goal of this review article is to briefly walk the reader through some basic AI concepts with focus on medical imaging processing (Section 2); followed by a presentation of the state-of-the-art methods and current trends in the domain (Section 3). To finish, we discuss the future research directions that will make possible the next generation of AI-based solutions for medical image applications (Section 4).

      Building blocks of AI methods for medical imaging

      The field of AI evolves rapidly, with new methods published at a high pace. However, there are several central concepts that have settled for good. This section presents a brief overview of these building blocks for AI methods, with a focus on medical imaging. For more detailed descriptions we refer to relevant books [
      • Morra L.
      • Delsanto S.
      • Correale L.
      ,

      Ranschaert ER, Morozov S, Algra PR, editors. Artificial intelligence in medical imaging: opportunities, applications and risks. Springer, Cham; 2019.

      ,

      Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. vol. 1. Springer series in statistics. New York; 2001.

      ,
      • Kuhn M.
      • Johnson K.
      Applied predictive modeling.
      ] and publications [
      • Shen C.
      • Nguyen D.
      • Zhou Z.
      • Jiang S.B.
      • Dong B.
      • Jia X.
      An introduction to deep learning in medical physics: advantages, potential, and challenges.
      ,
      • Cui S.
      • Tseng H.
      • Pakela J.
      • Ten Haken R.K.
      • El Naqa I.
      Introduction to machine and deep learning for medical physicists.
      ].

      Artificial intelligence, machine learning, and deep learning

      As mentioned previously, AI broadly refers to any method or algorithm that mimics human intelligence. Historically, AI has been approached from two directions: computationalism and connectionism. The former attempts to mimic formal reasoning and logic directly, regardless of its biological implementation. Mostly based on hardcoded axioms and rules that are combined to deduce new conclusions, computationalism is conceptually similar to computers, storing and processing symbols. Connectionism, on the other hand, rather follows a bottom-up approach, starting from models of biological neurons that are interconnected in large numbers and from which intelligence is intended to emerge by learning from experience.
      Expert systems [
      • Holman J.G.
      • Cookson M.J.
      Expert systems for medical applications.
      ,
      • Haug P.J.
      Uses of diagnostic expert systems in clinical care.
      ,
      • Miller R.A.
      Medical diagnostic decision support systems–past, present, and future: a threaded bibliography and brief commentary.
      ], which started to be very popular in the 1980, are a classical example of computationalism. Some famous applications of expert systems to the medical field are MYCIN (diagnosis of bacterial infection in the blood) [

      Buchanan BB, Buchanan BG, Buchanan BG, Shortliffe EH, Heuristic S. Rule-based expert systems: the MYCIN experiments of the stanford heuristic programming project. Addison Wesley Publishing Company; 1984.

      ], PUFF (interpretation of pulmonary function data) [
      • Aikins J.S.
      • Kunz J.C.
      • Shortliffe E.H.
      • Fallat R.J.
      PUFF: an expert system for interpretation of pulmonary function data.
      ], or INTERNIST-1 (diagnosis for internal medicine) [
      • Miller R.A.
      • Pople Jr, H.E.
      • Myers J.D.
      Internist-1, an experimental computer-based diagnostic consultant for general internal medicine.
      ]. However, the bottleneck of expert systems is the complexity of acquiring the required knowledge in the form of production rules and, thus, interest in computationalist algorithms started to fade since the 1990′s in favor of connectionism approaches [

      Buchanan BG. Can Machine Learning Offer Anything to Expert Systems? In: Marcus S, editor. Knowledge Acquisition: Selected Research and Commentary: A Special Issue of Machine Learning on Knowledge Acquisition, Boston, MA: Springer US; 1990, p. 5–8.

      ,
      • Su M.C.
      Use of neural networks as medical diagnosis expert systems.
      ]. The appeal of connectionism and learning-based AI holds in that it delegates the responsibility for accuracy and exhaustiveness to data instead of human experts, who might be poorly available, prone to error, bias, or subjectivity. The ever growing abundance of data, including medical images, then typically tilts the scales in favor of learning techniques, and the community has focused successively on two nested subfamilies (Fig. 1): machine learning and deep learning.
      Figure thumbnail gr1
      Fig. 1Artificial intelligence, machine learning, and deep learning can be seen as matryoshkas nested in each other. Artificial intelligence gathers both symbolic (top down) and connectionist (bottom up) approaches. Machine learning is the dominant branch of connectionism, combining biological (neural networks) and statistical (data-driven learning theory) influences. Deep learning focuses mainly on large-size neural networks, with functional specificities to process images, sounds, videos, etc.
      The specificity of Machine learning (ML) is to be driven by data, which gives machines (computers) “the ability to learn without being explicitly programmed”, as formulated by Arthur Samuel in 1959, a pioneer in the ML field. ML typically works in two phases, training and inference. Training allows patterns to be found in previously collected data, whereas inference compares these patterns to new unseen data to then carry out a certain task like prediction or decision making. Since the 1990's, ML algorithms have continuously evolved and improved, becoming more sophisticated and including hierarchical structures, which gave rise to the popular Deep Learning. The term Deep Learning (DL) was first coined by Aizenberg et al. [
      • Aizenberg I.N.
      • Aizenberg N.N.
      • Vandewalle J.
      ] in the 2000s, and refers to a subset of ML algorithms, with the particularity of being organized hierarchically, on multiple levels, hence the term “deep”, to automatically extract meaningful features from data.
      Although ML encompasses DL, DL is often opposed to classical “shallow” ML, the latter relying on algorithms that have a flatter architecture and depend on previous feature engineering to extract data representations. This distinction also reflects the evolution from ML to DL, namely, from specific feature engineering to generic feature learning. While ML generally relies on domain knowledge and expertise to define relevant features, DL involves generic, trainable features. In other words, despite the modeling power of ML, global performance remains limited by the adequacy of manually picked features. Alternatively, DL replaces these fixed specialized features with generic, trainable, low-level features that are involved in the learning procedure, thereby offering better performance guarantees. Sophistication is here achieved by stacking layers of simple features, leading to a hierarchical model structure. As training in DL concerns both the low-level features and the higher-level model, DL is often referred to as an end-to-end approach. For image data, this approach typically allows DL to learn optimal filters.
      Today, ML models have reached important milestones, in some cases being able to accomplish tasks with an accuracy that is similar to or even better than human experts. For instance, the diagnostic performance of DL models has demonstrated to be equivalent to that of health-care professionals for certain applications [
      • Liu X.
      • Faes L.
      • Kale A.U.
      • Wagner S.K.
      • Fu D.J.
      • Bruynseels A.
      • et al.
      A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis.
      ], such as skin-cancer detection [
      • Esteva A.
      • Kuprel B.
      • Novoa R.A.
      • Ko J.
      • Swetter S.M.
      • Blau H.M.
      • et al.
      Dermatologist-level classification of skin cancer with deep neural networks.
      ] or breast cancer detection [
      • Lotter W.
      • Diab A.R.
      • Haslam B.
      • Kim J.G.
      • Grisot G.
      • Wu E.
      • et al.
      Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach.
      ]. In particular, the latter reported a DL model that not only reached an excellent performance in mammogram classification, but also outperformed five out of five full-time breast-imaging specialists with an average increase in sensitivity of 14% [
      • Lotter W.
      • Diab A.R.
      • Haslam B.
      • Kim J.G.
      • Grisot G.
      • Wu E.
      • et al.
      Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach.
      ]. Image segmentation is another task that has experienced a transformation with the advent of ML algorithms. For instance, a recent study has described a DL model that can perform organ segmentation in the head and neck region from CT images with performance comparable to experienced radiographers [

      Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

      ]. For more detailed examples of the performance of state-of-the-art ML and DL methods for medical applications we refer to Section 3.

      Learning frameworks and strategies

      Machine learning can be broadly split into two complementary categories, supervised and unsupervised, which are inspired from human learning (Table 1). Supervised learning is the simplest and provides the tightest framework with strongest guarantees. It formalizes learning with a parent or teacher, providing the inputs and controlling the outputs. In supervised learning, the training data consists thus of labelled or annotated (input, output) pairs, and the model is trained to yield the right desired output when presented with some input. When data is not annotated, unsupervised learning, also known as self-organization, aims at discovering patterns in data (Fig. 2).
      Table 1Different learning frameworks and strategies, together with some of the most popular algorithms or techniques that are used for each of them, as well as a few examples of common applications in the field of medical imaging. The table is divided in three parts: the basic learning frameworks (supervised, unsupervised and reinforcement learning), the hybrid learning frameworks blending supervised and unsupervised, and finally common learning strategies that solve consecutive learning problems or combine several models together.
      Learning styleCommon algorithms / methodsExamples
      B A S I C L E A R N I N G F R A M E W O R K S
      Supervised

      learning
      • Linear or logistic regression
      • Decision trees and random forests
      • Support vector machines
      • Convolutional neural networks
      • Recurrent neural networks
      • Cancer diagnosis
        • Han Z.
        • Wei B.
        • Zheng Y.
        • Yin Y.
        • Li K.
        • Li S.
        Breast cancer multi-classification from histopathological images with structured deep learning model.
        ,
        • Wang H.
        • Zhou Z.
        • Li Y.
        • Chen Z.
        • Lu P.
        • Wang W.
        • et al.
        Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from F-FDG PET/CT images.
        ,
        • Becker A.S.
        • Mueller M.
        • Stoffel E.
        • Marcon M.
        • Ghafoor S.
        • Boss A.
        Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study.
        ,
        • Cheng J.-Z.
        • Ni D.
        • Chou Y.-H.
        • Qin J.
        • Tiu C.-M.
        • Chang Y.-C.
        • et al.
        Computer-aided diagnosis with deep learning architecture: applications to breast lesions in us images and pulmonary nodules in CT scans.
      • Organ segmentation

        Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

        ,
        • Guo Z.
        • Li X.
        • Huang H.
        • Guo N.
        • Li Q.
        Deep learning-based image segmentation on multimodal medical imaging.
        ,
        • Balagopal A.
        • Kazemifar S.
        • Nguyen D.
        • Lin M.-H.
        • Hannan R.
        • Owrangi A.
        • et al.
        Fully automated organ segmentation in male pelvic CT images.
        ,
        • Javaid U.
        • Dasnoy D.
        • Lee J.A.
        Multi-organ segmentation of chest CT images in radiation oncology: comparison of standard and dilated UNet.
        ,
        • Moradi S.
        • Oghli M.G.
        • Alizadehasl A.
        • Shiri I.
        • Oveisi N.
        • Oveisi M.
        • et al.
        MFP-Unet: a novel deep learning based approach for left ventricle segmentation in echocardiography.
        ,
        • Nemoto T.
        • Futakami N.
        • Yagi M.
        • Kunieda E.
        • Akiba T.
        • Takeda A.
        • et al.
        Simple low-cost approaches to semantic segmentation in radiation therapy planning for prostate cancer using deep learning with non-contrast planning CT images.
      • Radiotherapy dose denoising
        • Javaid U.
        • Souris K.
        • Dasnoy D.
        • Huang S.
        • Lee J.A.
        Mitigating inherent noise in Monte Carlo dose distributions using dilated U-Net.
      • Radiotherapy dose prediction
        • Nguyen D.
        • Jia X.
        • Sher D.
        • Lin M.-H.
        • Iqbal Z.
        • Liu H.
        • et al.
        3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
        ,
        • Fan J.
        • Wang J.
        • Chen Z.
        • Hu C.
        • Zhang Z.
        • Hu W.
        Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique.
      • Conversion between image modalities
        • Han X.
        MR-based synthetic CT generation using a deep convolutional neural network method.
        ,
        • Kazemifar S.
        • McGuire S.
        • Timmerman R.
        • Wardak Z.
        • Nguyen D.
        • Park Y.
        • et al.
        MRI-only brain radiotherapy: assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach.
      Unsupervised

      learning
      • (Variational) Auto encoders
      • Dimensionality reduction
        (e.g., Principal component analysis)
      • Clustering (e.g., K-means)
      • Domain adaptation tasks

        Madani A, Moradi M, Karargyris A, Syeda-Mahmood T. Semi-supervised learning with generative adversarial networks for chest X-ray classification with ability of data domain adaptation. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) 2018. https://doi.org/10.1109/isbi.2018.8363749.

        ,
        • Jiang J.
        • Hu Y.-C.
        • Tyagi N.
        • Rimner A.
        • Lee N.
        • Deasy J.O.
        • et al.
        PSIGAN: joint probabilistic segmentation and image distribution matching for unpaired cross-modality adaptation-based MRI segmentation.
        ,
        • Chen C.
        • Dou Q.
        • Chen H.
        • Qin J.
        • Heng P.A.
        Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation.
        ,
        • Perone C.S.
        • Ballester P.
        • Barros R.C.
        • Cohen-Adad J.
        Unsupervised domain adaptation for medical imaging segmentation with self-ensembling.
        ,
        • Dou Q.
        • Ouyang C.
        • Chen C.
        • Chen H.
        • Glocker B.
        • Zhuang X.
        • et al.
        PnP-AdaNet: plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation.
      • Classification of patient groups
        • Lynch C.M.
        • van Berkel V.H.
        • Frieboes H.B.
        Application of unsupervised analysis techniques to lung cancer patient data.
      • Image reconstruction
        • Mehta J.
        • Majumdar A.
        RODEO: robust DE-aliasing autoencOder for real-time medical image reconstruction.
      Reinforcement

      learning
      • Q-learning
      • Markov Decision Processes
      H Y B R I D L E A R N I N G F R A M E W O R K S
      Semi-supervised

      learning
      • Generative Adversarial Networks
      • Tumor classification [45,46]
      • Organ segmentation
        • Burton 2nd, W.
        • Myers C.
        • Rullkoetter P.
        Semi-supervised learning for automatic segmentation of the knee from MRI with convolutional neural networks.
      • Synthetic image generation
        • Liang X.
        • Chen L.
        • Nguyen D.
        • Zhou Z.
        • Gu X.
        • Yang M.
        • et al.
        Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy.
        ,
        • Yang X.
        • Lin Y.
        • Wang Z.
        • Li X.
        • Cheng K.-T.
        Bi-modality medical image synthesis using semi-supervised sequential generative adversarial networks.
      Self-supervised learning
      • Pretext task: distortion (e.g. rotation), color- or intensity-based, patch extraction
      L E A R N I N G S T R A T E G I E S
      Transfer

      learning
      • Inductive
      • Transductive
      • Unsupervised
      • Radiotherapy toxicity prediction
        • Zhen X.
        • Chen J.
        • Zhong Z.
        • Hrycushko B.
        • Zhou L.
        • Jiang S.
        • et al.
        Deep convolutional neural network with transfer learning for rectum toxicity prediction in cervical cancer radiotherapy: a feasibility study.
      • Adaptation to different clinical practices
        • Kandalan R.N.
        • Nguyen D.
        • Rezaeian N.H.
        • Barragán-Montero A.M.
        • Breedveld S.
        • Namuduri K.
        • et al.
        Dose prediction with deep learning for prostate cancer radiation therapy: model adaptation to different treatment planning practices.
      • Improving model generalization
        • Liang X.
        • Nguyen D.
        • Jiang S.B.
        Generalizability issues with deep learning models in medicine and their potential solutions: illustrated with cone-beam computed tomography (CBCT) to computed tomography (CT) image conversion.
      Ensemble

      learning
      • Bagging - Bootstrap AGGregatING - (e.g. random forests)
      • Boosting (e.g. AdaBoost, gradient boosting)
      • Radiotherapy dose prediction
        • McIntosh C.
        • Welch M.
        • McNiven A.
        • Jaffray D.A.
        • Purdie T.G.
        Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
        ,
        • McIntosh C.
        • Purdie T.G.
        Voxel-based dose prediction with multi-patient atlas selection for automated radiotherapy treatment planning.
      • Estimation of uncertainty
        • Nguyen D.
        • Sadeghnejad Barkousaraie A.
        • Bohara G.
        • Balagopal A.
        • McBeth R.
        • Lin M.-H.
        • et al.
        A comparison of Monte Carlo dropout and bootstrap aggregation on the performance and uncertainty estimation in radiation therapy dose prediction with deep learning neural networks.
      • Stratification of patients
        • Valdes G.
        • Luna J.M.
        • Eaton E.
        • Simone 2nd, C.B.
        • Ungar L.H.
        • Solberg T.D.
        MediBoost: a patient stratification tool for interpretable decision making in the era of precision medicine.
      Figure thumbnail gr2
      Fig. 2Three classical learning frameworks in artificial intelligence: supervised, semi-supervised, and unsupervised learning. Supervised learning relies on known input–output pairs. If some output labels are difficult or expensive to get, semi-supervised learning can apply. If no labels are available, unsupervised learning allows for a more exploratory approach of data.
      Typical supervised tasks involve function approximation, like regression and classification. Classification can be binary, like in determining whether a pathology is present or not in an image [
      • Lotter W.
      • Diab A.R.
      • Haslam B.
      • Kim J.G.
      • Grisot G.
      • Wu E.
      • et al.
      Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach.
      ,
      • Shen L.
      • Margolies L.R.
      • Rothstein J.H.
      • Fluder E.
      • McBride R.
      • Sieh W.
      Deep learning to improve breast cancer detection on screening mammography.
      ], involve multiple classes, as in determining a particular pathology among several labels [
      • Komura D.
      • Ishikawa S.
      Machine learning methods for histopathological image analysis.
      ,
      • Yadav S.S.
      • Jadhav S.M.
      Deep convolutional neural network based medical image classification for disease diagnosis.
      ,
      • Zhang X.
      • Zhao H.
      • Zhang S.
      • Li R.
      A novel deep neural network model for multi-label chronic disease prediction.
      ], or concern not the whole image but each pixel, as done for image segmentation [
      • Hesamian M.H.
      • Jia W.
      • He X.
      • Kennedy P.
      Deep learning techniques for medical image segmentation: achievements and challenges.
      ,
      • Ibragimov B.
      • Xing L.
      Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks.
      ]. On the regression side, also in a pixel-wise way, image enhancement (e.g. improving a low-quality image, the input, by mapping it to its higher quality counterpart, the output label or annotation) [
      • Javaid U.
      • Souris K.
      • Dasnoy D.
      • Huang S.
      • Lee J.A.
      Mitigating inherent noise in Monte Carlo dose distributions using dilated U-Net.
      ] or image-to-image mapping (e.g. mapping a CT image, the input, to the corresponding dose distribution, the output) [
      • Nguyen D.
      • Long T.
      • Jia X.
      • Lu W.
      • Gu X.
      • Iqbal Z.
      • et al.
      A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning.
      ]. More examples of clinical applications of supervised learning and the common ML methods used within this learning framework are presented in Table 1.
      In contrast, most unsupervised tasks relate to probability density estimation, like clustering (finding separated groups of similar data items), outlier or anomaly detection (isolated items), or even manifold learning and dimensionality reduction (subspaces on which data concentrate). The use of unsupervised learning has been, so far, much more limited than its supervised counterpart, although useful applications for medical imaging exist, such as domain adaptation (e.g., adapting a segmentation model trained on an image modality to work on a different image modality) [
      • Chen C.
      • Dou Q.
      • Chen H.
      • Qin J.
      • Heng P.A.
      Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation.
      ,
      • Perone C.S.
      • Ballester P.
      • Barros R.C.
      • Cohen-Adad J.
      Unsupervised domain adaptation for medical imaging segmentation with self-ensembling.
      ,
      • Dou Q.
      • Ouyang C.
      • Chen C.
      • Chen H.
      • Glocker B.
      • Zhuang X.
      • et al.
      PnP-AdaNet: plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation.
      ], data generation (e.g., generate artificial realistic images) [
      • Liu Y.
      • Lei Y.
      • Wang Y.
      • Wang T.
      • Ren L.
      • Lin L.
      • et al.
      MRI-based treatment planning for proton radiotherapy: dosimetric validation of a deep learning-based liver synthetic CT generation method.
      ,
      • Liu Y.
      • Lei Y.
      • Wang T.
      • Fu Y.
      • Tang X.
      • Curran W.J.
      • et al.
      CBCT-based synthetic CT generation using deep-attention cycleGAN for pancreatic adaptive radiotherapy.
      ,
      • Lei Y.
      • Harms J.
      • Wang T.
      • Liu Y.
      • Shu H.
      • Jani A.B.
      • et al.
      MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks.
      ] or even image segmentation [
      • Aganj I.
      • Harisinghani M.G.
      • Weissleder R.
      • Fischl B.
      Unsupervised medical image segmentation based on the local center of mass.
      ]. Table 1 presents some of the main ML methods that work in an unsupervised framework.
      Semi-supervised learning is a hybrid framework halfway between supervised and unsupervised and thus it involves data for which desired outputs are only partly known. Groups identified as clusters by unsupervised learning can be used as possible class labels [
      • Peikari M.
      • Salama S.
      • Nofech-Mozes S.
      • Martel A.L.
      A cluster-then-label semi-supervised learning approach for pathology image classification.
      ] (Fig. 2). Some examples of clinical applications for semi-supervised learning include the generation or translation of images from a specific class to another in a semi-supervised setting (e.g., generation of synthetic CTs from MR images) [
      • Jin C.-B.
      • Kim H.
      • Liu M.
      • Han I.H.
      • Lee J.I.
      • Lee J.H.
      • et al.
      DC2Anet: generating lumbar spine MR images from CT scan data based on semi-supervised learning.
      ,
      • Wang Z.
      • Lin Y.
      • Cheng K.-T.-T.
      • Yang X.
      Semi-supervised mp-MRI data synthesis with StitchLayer and auxiliary distance maximization.
      ], and segmentation or classification of images with partially labelled data [
      • Ge C.
      • Gu I.-Y.-H.
      • Jakola A.S.
      • Yang J.
      Deep semi-supervised learning for brain tumor classification.
      ,
      • Burton 2nd, W.
      • Myers C.
      • Rullkoetter P.
      Semi-supervised learning for automatic segmentation of the knee from MRI with convolutional neural networks.
      ].
      So far, supervised learning has been the most used learning framework for medical imaging applications, as it is totally univocal and models are very easy to train. However, it is well-known that data labelling in the medical domain is an extremely time-consuming task, subject to costly inspection by human experts. Therefore, more and more researchers are now exploring semi-supervised learning techniques because they are an excellent alternative to complement small sets of carefully labelled data with large amounts of cheap unlabelled data collected automatically [
      • Peikari M.
      • Salama S.
      • Nofech-Mozes S.
      • Martel A.L.
      A cluster-then-label semi-supervised learning approach for pathology image classification.
      ,
      • Cheplygina V.
      • de Bruijne M.
      • Pluim J.P.W.
      Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis.
      ]. In fact, many of the current limitations of ML/DL algorithms come from the use of labelled data (e.g., errors in labels [
      • Frénay B.
      • Verleysen M.
      Classification in the presence of label noise: a survey.
      ], limited size labelled databases, etc) and thus, although the use of fully unsupervised learning in the medical field is still very limited, we believe that future research will focus on unsupervised techniques in order to unlock the full potential of ML. Very recently, unsupervised models are achieving improved performances over supervised models for computer vision tasks [

      Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Iii HD, Singh A, editors. Proceedings of the 37th international conference on machine learning, vol. 119, PMLR; 2020, p. 1597–607.

      ], and the same is likely to happen for medical imaging applications.
      Yet another type of learning is by interacting with an environment where an agent gets feedback from its actions over the course of time, which is known as reinforcement learning. After each action towards a new state, the environment can either reward or punish the agent who has then to best predict the longer-term consequences of future actions in a trial and error fashion. The use of reinforcement learning for medical imaging is still not very extended, but has increased in the last couple of years, with promising applications that allow mimicking physician behaviour for typical tasks such as the design of a treatment [
      • Zhang J.
      • Wang C.
      • Sheng Y.
      • Palta M.
      • Czito B.
      • Willett C.
      • et al.
      An interpretable planning bot for pancreas stereotactic body radiation therapy.
      ,
      • Shen C.
      • Gonzalez Y.
      • Klages P.
      • Qin N.
      • Jung H.
      • Chen L.
      • et al.
      Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer.
      ,
      • Shen C.
      • Nguyen D.
      • Chen L.
      • Gonzalez Y.
      • McBeth R.
      • Qin N.
      • et al.
      Operating a treatment planning system using a deep-reinforcement learning-based virtual treatment planner for prostate cancer intensity-modulated radiation therapy treatment planning.
      ,
      • Watts J.
      • Khojandi A.
      • Vasudevan R.
      • Ramdhani R.
      Optimizing individualized treatment planning for Parkinson’s disease using deep reinforcement learning.
      ,
      • Winkel D.J.
      • Weikert T.J.
      • Breit H.-C.
      • Chabin G.
      • Gibson E.
      • Heye T.J.
      • et al.
      Validation of a fully automated liver segmentation algorithm using multi-scale deep reinforcement learning and comparison versus manual segmentation.
      ,
      • Li Z.
      • Xia Y.
      Deep reinforcement learning for weakly-supervised lymph node segmentation in CT images.
      ], among others (Table 1).
      On top of these three basic learning frameworks (supervised, unsupervised and reinforcement learning), there are other strategies that enable us to reuse previously trained models (transfer learning) or combine models (ensemble learning). Transfer learning [

      Thrun S, Pratt L. Learning to learn: introduction and overview. In: Thrun S, Pratt L, editors. Learning to learn. Boston, MA: Springer US; 1998, p. 3–17.

      ,
      • Pan S.J.
      • Yang Q.
      A survey on transfer learning.
      ] reuses blocks and layers from a model that was pre-trained with some data and for a certain task (source domain and task) and fine-tune it to be applied to different data and/or task (target domain and task). For example, a classification model pre-trained on ImageNet (a big collection of natural images) can be partly reused and fine-tuned for medical imaging applications, such as organ segmentation or treatment outcome prediction [
      • Zhen X.
      • Chen J.
      • Zhong Z.
      • Hrycushko B.
      • Zhou L.
      • Jiang S.
      • et al.
      Deep convolutional neural network with transfer learning for rectum toxicity prediction in cervical cancer radiotherapy: a feasibility study.
      ,
      • Morid M.A.
      • Borjali A.
      • Del Fiol G.
      A scoping review of transfer learning research on medical image analysis using ImageNet.
      ,
      • Shin H.-C.
      • Roth H.R.
      • Gao M.
      • Lu L.
      • Xu Z.
      • Nogues I.
      • et al.
      Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning.
      ]. Transfer learning allows us to exploit knowledge from different but related domains, mitigating the necessity of a big dataset for the target task, and improving the model performance [
      • Shin H.-C.
      • Roth H.R.
      • Gao M.
      • Lu L.
      • Xu Z.
      • Nogues I.
      • et al.
      Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning.
      ,
      • van Opbroek A.
      • Ikram M.A.
      • Vernooij M.W.
      • de Bruijne M.
      Transfer learning improves supervised image segmentation across imaging protocols.
      ,
      • Kandalan R.N.
      • Nguyen D.
      • Rezaeian N.H.
      • Barragán-Montero A.M.
      • Breedveld S.
      • Namuduri K.
      • et al.
      Dose prediction with deep learning for prostate cancer radiation therapy: model adaptation to different treatment planning practices.
      ]. Ensemble learning methods are also a way to improve the overall performance and the stability of the model, by combining the output of multiple models or algorithms to perform a task [

      Schapire RE. The strength of weak learnability. In: 30th annual symposium on foundations of computer science 1989. https://doi.org/10.1109/sfcs.1989.63451.

      ]. Some examples of medical applications include the mapping of patient anatomy to dose distribution for radiotherapy treatments [
      • Zhang J.
      • Wu Q.J.
      • Xie T.
      • Sheng Y.
      • Yin F.-F.
      • Ge Y.
      An ensemble approach to knowledge-based intensity-modulated radiation therapy planning.
      ], image segmentation [
      • Jia H.
      • Xia Y.
      • Song Y.
      • Cai W.
      • Fulham M.
      • Feng D.D.
      Atlas registration and ensemble deep convolutional neural network-based prostate segmentation using magnetic resonance imaging.
      ], or classification [
      • An N.
      • Ding H.
      • Yang J.
      • Au R.
      • Ang T.F.A.
      Deep ensemble learning for Alzheimer’s disease classification.
      ].
      Last but not least, self-supervised learning is a recent hybrid framework that has become state-of-the-art in natural language processing [

      Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv [csCL] 2019.

      ,

      Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv [csCL] 2018.

      ,
      • Conneau A.
      • Khandelwal K.
      • Goyal N.
      • Chaudhary V.
      • Wenzek G.
      • Guzmán F.
      • et al.
      Unsupervised cross-lingual representation learning at scale.
      ]. It is gaining attention for computer vision tasks [
      • Jing L.
      • Tian Y.
      Self-supervised visual feature learning with deep neural networks: a survey.
      ,
      • Kolesnikov A.
      • Zhai X.
      • Beyer L.
      Revisiting self-supervised visual representation learning.
      ,
      • He K.
      • Fan H.
      • Wu Y.
      • Xie S.
      • Girshick R.
      Momentum contrast for unsupervised visual representation learning.
      ,

      Goyal P, Caron M, Lefaudeux B, Xu M, Wang P, Pai V, et al. Self-supervised pretraining of visual features in the wild. arXiv [csCV] 2021.

      ] and it could play an important role in future research directions for medical imaging applications. Self-supervised learning can be seen as a variant of unsupervised learning, in the sense that it works with unlabelled data. However, the trick here is to exploit labels that come for “free” with the data, namely, those that can be extracted from the structure of data itself. Self-supervised algorithms work in two steps. First, the model is pre-trained to solve a “pretext task” where the aim is to obtain those supervisory signals from the data. Second, the acquired knowledge is transferred and the model is fine tuned to solve the main or “downstream task”. The literature on self-supervision for medical imaging applications is still scarce [

      Taleb A, Loetzsch W, Danz N, Severin J, Gaertner T, Bergner B, et al. 3D self-supervised methods for medical imaging. arXiv [csCV] 2020.

      ,

      Hatamizadeh A, Yang D, Roth H, Xu D. UNETR: Transformers for 3D Medical Image Segmentation. arXiv [eessIV] 2021.

      ,
      • Chen L.
      • Bentley P.
      • Mori K.
      • Misawa K.
      • Fujiwara M.
      • Rueckert D.
      Self-supervised learning for medical image analysis using image context restoration.
      ,
      • Nguyen X.-B.
      • Lee G.S.
      • Kim S.H.
      • Yang H.J.
      Self-supervised learning based on spatial awareness for medical image analysis.
      ], but for instance, a recent work used context restoration as a pretext task [
      • Chen L.
      • Bentley P.
      • Mori K.
      • Misawa K.
      • Fujiwara M.
      • Rueckert D.
      Self-supervised learning for medical image analysis using image context restoration.
      ]. Especifically, small patches in the image were randomly selected and swapped to obtain a new image with altered spatial information, and the pretext task consisted in predicting or restoring the original version of the image. They later used this knowledge to tune the model for image classification of 2D fetal ultrasound images; organ localization on abdominal CT images, and segmentation on brain MR images.
      The existence of several hybrid learning frameworks shows that the boundaries between supervised and unsupervised learning has been progressively blurred to accommodate hybrid framework and combined strategies (Table 1), which can address real-world problems and data sets pragmatically (see Fig. 3).
      Figure thumbnail gr3
      Fig. 3The tight framework of supervised learning can be hybridized with unsupervised learning to make room for practical cases and problems, as well as to accommodate temporality. Delaying supervision in future times leads towards reinforcement learning. Incompletely labelled data fosters semi-supervised learning, whereas small data sets encourage reusing (parts of) models trained previously on similar but bigger data sets, like in transfer learning. In self-supervision, pretraining relies on solving dummy supervised problems, where fake labels are created based on the inherent structure of image or sound data.

      Typical AI-based medical imaging analysis workflow

      Reviewing past works in the AI and ML literature shows that common blocks are used in most workflows for medical imaging processing (Fig. 4). As ML is driven by data, preliminary steps are to extract and select relevant features from data, that is, quantitative characteristics that summarize information conveyed by data into vectors or arrays. Then, this information is fed to generic predictive models, like classifiers or regressors, which learn to perform a certain task. An example of this strategy is the field of radiomics [
      • Lambin P.
      • Leijenaar R.T.H.
      • Deist T.M.
      • Peerlings J.
      • de Jong E.E.C.
      • van Timmeren J.
      • et al.
      Radiomics: the bridge between medical imaging and personalized medicine.
      ,
      • Lambin P.
      • Rios-Velazquez E.
      • Leijenaar R.
      • Carvalho S.
      • van Stiphout R.G.P.M.
      • Granton P.
      • et al.
      Radiomics: extracting more information from medical images using advanced feature analysis.
      ], where “-omics-like” features are extracted from radiological images in order to predict some indicator of interest like a disease grade or a patient's survival.
      Figure thumbnail gr4
      Fig. 4General ML pipeline for supervised learning: supervised predictive models are fed with features that are extracted and/or selected beforehand in an unsupervised way. Feature selection can, however, be embedded in some models, using regularization, for instance; selection then becomes supervised and therefore often improved. Classical (shallow) models tend to critically depend on unsupervised feature extraction and selection to preprocess data. In contrast, deep learning drops unsupervised feature extraction and selection; instead, it embeds multiple trainable layers of feature extractors and selectors, allowing the full pipeline to be supervised, end to end.

      Feature engineering, extraction, and selection

      Feature engineering, extraction, and selection are key steps to channel data to an AI method [
      • Morra L.
      • Delsanto S.
      • Correale L.
      ]. Feature engineering refers to crafting features by hand, either in ad hoc fashion or by relying on generic features from the literature. For images, the former could be gray level, color statistics, or shape descriptors (volume, diameter, curvature, sphericity, …). Image features are often classified in local or low-level features (specific to a small group of pixels in the image) and global or high-level features (characterizing the full image). For the latter, generic features would for instance result from applying Gabor or Laplace filters, edge detectors like Sobel operators, texture descriptors, Zernike moments, or popular transforms like Fourier’s or wavelet bases. In radiomics, all the above mentioned features can be used together, like ad hoc tumor shape and intensity descriptors, as well as textural descriptors (typically, Haralick’s gray level co-occurrence matrix [

      Textural Features for Image Classification n.d. https://doi.org/10.1109%2FTSMC.1973.4309314 (accessed March 29, 2021).

      ]).
      As an alternative or in a second time, higher-level features can be extracted in a more data-driven way, using dimensionality reduction. Methods like Principal Component Analysis [
      • Jolliffe I.T.
      Principal component analysis.
      ], Linear Discriminant Analysis [
      • Cristianini N.
      Fisher discriminant analysis (linear discriminant analysis).
      ], auto-associative [
      • Hinton G.E.
      • Salakhutdinov R.R.
      Reducing the dimensionality of data with neural networks.
      ] networks can reduce the number of input variables according to some unsupervised or supervised criterion. For images more specifically, the convolutional filters involved in CNNs bear similarity with the filters above: they extract local features, but their parameters are learnt from data and stacking them allows the global higher-level features to emerge. When features are not extracted in a supervised, data-driven way, it might happen that some of them are redundant or not relevant. To address this issue, a feature selection step can discard those to focus on a reduced set of features. Feature selection can follow several strategies, by either selecting or discarding. Wrappers [

      Kohavi R, John GH. The wrapper approach. Feature extraction, construction and selection 1998:33–50. https://doi.org/10.1007/978-1-4615-5725-8_3.

      ] use a supervised predictive model to score subsets of features. To avoid the burden of a full fledged predictive model, feature filters [
      • Guyon I.
      • Gunn S.
      • Nikravesh M.
      • Zadeh L.A.
      Feature extraction: foundations and applications.
      ], not to be confused with image filters above, use an unsupervised surrogate to score feature subsets, like their correlation or mutual information. Embedded methods [

      Lal TN, Chapelle O, Weston J, Elisseeff A. Embedded methods. Feature extraction n.d.:137–65. https://doi.org/10.1007/978-3-540-35488-8_6.

      ] are directly integrated into the predictive model. For instance, feature weight regularization can favor sparse configurations, where irrelevant features get null weights. Examples of features selection in radiomics can be found in [
      • Yang F.
      • Chen W.
      • Wei H.
      • Zhang X.
      • Yuan S.
      • Qiao X.
      • et al.
      Machine learning for histologic subtype classification of non-small cell lung cancer: a retrospective multicenter radiomics study.
      ,
      • Liu T.
      • Wu G.
      • Yu J.
      • Guo Y.
      • Wang Y.
      • Shi Z.
      • et al.
      A mRMRMSRC feature selection method for radiomics approach.
      ,

      Yuan R, Tian L, Chen J. An RF-BFE algorithm for feature selection in radiomics analysis. In: Medical imaging 2019: imaging informatics for healthcare, research, and applications 2019. https://doi.org/10.1117/12.2512045.

      ,
      • Oubel E.
      • Beaumont H.
      • Iannessi A.
      Mutual information-based feature selection for radiomics.
      ,
      • Wei B.-Y.
      • Bing-Yan W.E.I.
      • Song J.-L.
      • Li-Xu G.U.
      The research of reproducibility and non-redundancy feature selection methods in radiomics.
      ,
      • Sun P.
      • Wang D.
      • Mok V.C.
      • Shi L.
      Comparison of feature selection methods and machine learning classifiers for radiomics analysis in glioma grading.
      ,
      • van Timmeren J.E.
      • Leijenaar R.T.H.
      • van Elmpt W.
      • Reymen B.
      • Lambin P.
      Feature selection methodology for longitudinal cone-beam CT radiomics.
      ], for instance. Deep neural networks typically rely on this last approach with regularization. In the example of radiomics, embedded feature selection can be implemented with deep neural networks [
      • Xu Y.
      • Hosny A.
      • Zeleznik R.
      • Parmar C.
      • Coroller T.
      • Franco I.
      • et al.
      Deep learning predicts lung cancer treatment response from serial medical imaging.
      ] and regularization, hence allowing for end-to-end learning instead of combining manually engineered features with shallow predictive models.

      Predictive models

      Common tasks in AI are regression and classification. The AI/ML model then attempts to predict either continuous values (e.g., a dose or a survival time) or class probabilities (e.g., benign vs malignant) starting from input features. In the following, we describe the main methodological aspects of the basic predictive ML models, state-of-the-art ML/DL methods and examples of their clinical applications for medical imaging are presented in the next section.
      Regression is the most generic task in supervised learning. Linear regression is well known but other mathematical models can involve exponential or polynomial functions. ML generalizes this concept to universal approximators that can fit data sampled from almost any smooth function, with also possibly many input and output variables. Artificial neural networks (NNs) are the most iconic universal approximators (Fig. 5). They consist of interconnected formal models of neurons, a mathematical ‘cell’ combining several ‘dendritic’ inputs into a weighted sum that triggers an ‘axonal’ output through a nonlinear activation function, like a step, a sigmoid, or a hinge (Rectified Linear Unit, ReLU).
      Figure thumbnail gr5
      Fig. 5Artificial neural networks in a nutshell. (a) The formal neuron, processing several dendritic inputs through a nonlinear activation function f, to produce its actional output. (b) The neurons can be interconnected in a feed-forward way, into successive layers; as soon as a nonlinear ‘hidden’ layer is inserted in between the inputs and outputs, the network can potentially approximate any function; specific activation functions can be fitted in the output layer to achieve either regression or classification. (c) Examples of nonlinear activation functions in the hidden layers: the step function, from biological inspiration, the sigmoid, its continuous and differentiable surrogate, and the rectified linear unit (ReLU), that improves training of deep layers.
      As soon as a hidden layer of neurons with nonlinear activation functions is inserted between the input and the output layers, a NN becomes a universal approximator [

      Cybenko G. Approximations by superpositions of a sigmoidal function. 1989.

      ]. However, a notion of capacity is associated with the NN architecture: the more neurons the hidden layer counts, the more complex functions can be approximated. The capacity is roughly proportional to the number of synaptic weights (parameters) in the NN and it is analogous to the polynomial order in polynomial regression (the number of weights in the terms). Deep NNs are obviously also universal approximators [

      Cybenko G. Approximations by superpositions of a sigmoidal function. 1989.

      ,
      • Hanin B.
      Universal function approximation by deep neural nets with bounded width and ReLU activations.
      ]. Their interest lies in trading width of a single hidden layer for depth, as stacks of hidden layers allow functional difference (e.g., convolutive neurons for image data) and thus hierarchical processing, explaining the later success of deep networks compared to shallow ones. Most NNs are feed-forward, meaning that data flows unidirectionally from inputs to outputs. Recurrent NNs (RNNs) add feedback loops to feedforward connections, allowing them to process sequences of data (text, videos) and somehow to keep memories of past inputs, which then gives context to new inputs.
      Training of NNs relies on minimizing a loss function between the desired output and the one provided by the NN in its current parameter configuration. The partial derivatives, or gradient, of the loss function with respect to these parameters indicates the direction in which tuning the parameters is likely to decrease the loss. In a feedforward NN, this derivative information flows back from layer to layer and is therefore called gradient backpropagation.
      For regression, typical loss functions can be the mean square error or mean absolute error. With a suitable change of the output layer (softmax or normalized exponential) and loss function (the cross entropy), the NN can approximate class probabilities.
      Classification is the other prominent task in ML. Classifiers are simply algorithms that can sort data into groups or categories, and there exists a large variety of them [
      • Erickson B.J.
      • Korfiatis P.
      • Akkus Z.
      • Kline T.L.
      Machine learning for medical imaging.
      ]. Some of the most popular ones are very intuitive and easy to interpret, such as decision trees [
      • Hartmann C.
      • Varshney P.
      • Mehrotra K.
      • Gerberich C.
      Application of information theory to the construction of efficient decision trees.
      ], where input data is classified by going through a hierarchical, tree-like process including different branching tests of the data features (Fig. 6a). Growing several complementary decision trees together, in an ensemble learning strategy, leads to random forests (Fig. 6b, see also Section 3). Other simple algorithms for classification include the linear classifier, the Bayesian classifier, or the Perceptron (Fig. 5a). More sophisticated algorithms can actually be used for both regression and classification tasks. Some examples are NNs (Fig. 5b), which can yield class probabilities with suitable output layers; or support vector machines [
      • Vapnik V.
      Pattern recognition using generalized portrait method.
      ], which can be seen as an improved linear classifier that works in a higher-dimensional space and try to fit the separation (hyper)plane with the thickest margin in between points of two classes (Fig. 7).
      Figure thumbnail gr6
      Fig. 6(a) Decision trees assign labels (leafs) to a given sample by going through a multi-level structure where different features (root nodes) and solutions (branches) are tested. (b) In a Random Forest algorithm, decision trees are combined, following an ensemble learning approach, which enables to get more accurate predictions than a single tree. Each individual tree in the forest spits out a class prediction and the class with the most votes becomes the final model’s prediction.
      Figure thumbnail gr7
      Fig. 7Principle of the linear support vector machine, which lifts the indeterminacy of separable classification by fitting the thickest margin, stuck in between a few ‘support vectors’. The principle can be extended to nonlinear class separation by using Mercer kernels
      [
      • Vapnik V.
      Pattern recognition using generalized portrait method.
      ]
      .

      State-of-the-art AI methods for medical image analysis

      In the last decade, intensive research in AI methods for medical applications, and specifically in ML/DL (Fig. 8, left), yielded thousands of publications reporting the performance of new algorithms and/or original variants of the existing ones. The number of publications using some of the most popular ML/DL methods is presented in Fig. 8. In particular, in recent years, attention has moved from ML methods such as SVMs and Random Forests to Convolutional Neural Networks (Fig. 8, right). In addition, since 2018, the use of other DL methods such as Generative Adversarial Networks or reinforcement learning algorithms is rapidly increasing. Notice that this section is not intended to be an exhaustive review of the application of AI methods to the medical field, but rather an illustration of the potential of these methods. Thus, in the following, we describe the basic methodological aspects of two of the most widely used algorithms (Random Forests and CNNs), as well as the increasingly popular GANs, and we provide some examples of recent applications of these methods to the field of medical image processing.
      Figure thumbnail gr8
      Fig. 8Number of publications since 2010 till 2020 in the PubMed repository, containing keywords related to AI/ML/DL methods in the title and/or abstract.

      Random forests (RFs)

      Random forests (RFs) [
      • Amit Y.
      • Geman D.
      Shape quantization and recognition with randomized trees.
      ,
      • Breiman L.
      Random forests.
      ] use an ensemble of uncorrelated binary decision trees (multiple learning models) to find the best predictive model (Fig. 6). Each decision tree can be seen as a base model (binary classifier) with its respective decision, where a combination of such decisions leads to the final output. This is achieved in RFs by using two distinctive mechanisms, i.e., internal feature selection and voting [

      Konukoglu E, Glocker B. Random forests in medical image computing. In: Handbook of medical image computing and computer assisted intervention, Elsevier; 2020, p. 457–80.

      ]. The RFs algorithm extracts a multitude of low-level (simple) data representations and uses the feature selection mechanism on all collected features to find the most informative ones. After feature selection, a majority vote on selected classifiers yields the final decision. For a full detailed description of the RF algorithm we refer to [

      Konukoglu E, Glocker B. Random forests in medical image computing. In: Handbook of medical image computing and computer assisted intervention, Elsevier; 2020, p. 457–80.

      ].
      The earliest applications of RFs date from a decade ago for organ localization [
      • Criminisi A.
      • Shotton J.
      • Bucciarelli S.
      Decision forests with long-range spatial context for organ localization in CT volumes.
      ] and delineation [

      Lempitsky V, Verhoek M, Noble JA, Blake A. Random forest classification for automatic delineation of myocardium in real-time 3D echocardiography. In: Functional imaging and modeling of the heart, Springer Berlin Heidelberg; 2009, p. 447–56.

      ]. Since then, RFs have been applied to numerous tasks, including detection and localization, segmentation, and image-based prediction [

      Konukoglu E, Glocker B. Random forests in medical image computing. In: Handbook of medical image computing and computer assisted intervention, Elsevier; 2020, p. 457–80.

      ]. For some specific applications, RFs have demonstrated an improved performance over other classical ML methods. For instance, Deist et al. [
      • Deist T.M.
      • Dankers F.J.W.M.
      • Valdes G.
      • Wijsman R.
      • Hsu I.-C.
      • Oberije C.
      • et al.
      Machine learning algorithms for outcome prediction in (chemo)radiotherapy: an empirical comparison of classifiers.
      ,

      Differentiation of glioblastoma from solitary brain metastases using radiomic machine-learning classifiers. Cancer Lett 2019;451:128–35.

      ] compared six different classification algorithms (decision tree, RFs, NNs, SVM, elastic net logistic regression, LogitBoost) on 12 datasets with a total of 3496 patients, for outcome and toxicity prediction in (chemo)radiotherapy. They concluded that RFs was the algorithm achieving higher discriminative performance (on 6 over 12 datasets). This goes in line with the findings from more fundamental ML research studies, which have reported RFs as one of the best classical learning algorithms [

      Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, New York, NY, USA: Association for Computing Machinery; 2006, p. 161–8.

      ]. However, many other works in the medical field have also compared the accuracy of RFs against more complex or simpler ML classifiers, and it is well known that their performance may vary for different applications [
      • Valdes G.
      • Luna J.M.
      • Eaton E.
      • Simone 2nd, C.B.
      • Ungar L.H.
      • Solberg T.D.
      MediBoost: a patient stratification tool for interpretable decision making in the era of precision medicine.
      ,
      • Yang F.
      • Chen W.
      • Wei H.
      • Zhang X.
      • Yuan S.
      • Qiao X.
      • et al.
      Machine learning for histologic subtype classification of non-small cell lung cancer: a retrospective multicenter radiomics study.
      ,

      Differentiation of glioblastoma from solitary brain metastases using radiomic machine-learning classifiers. Cancer Lett 2019;451:128–35.

      ,
      • Ma C.
      • Wang R.
      • Zhou S.
      • Wang M.
      • Yue H.
      • Zhang Y.
      • et al.
      The structural similarity index for IMRT quality assurance: radiomics-based error classification.
      ,
      • Akcay M.
      • Etiz D.
      • Celik O.
      Prediction of survival and recurrence patterns by machine learning in gastric cancer cases undergoing radiation therapy and chemotherapy.
      ,
      • Dean J.A.
      • Wong K.H.
      • Welsh L.C.
      • Jones A.-B.
      • Schick U.
      • Newbold K.L.
      • et al.
      Normal tissue complication probability (NTCP) modelling using spatial dose metrics and machine learning methods for severe acute oral mucositis resulting from head and neck radiotherapy.
      ,
      • Qiu X.
      • Gao J.
      • Yang J.
      • Hu J.
      • Hu W.
      • Kong L.
      • et al.
      A Comparison study of machine learning (random survival forest) and classic statistic (cox proportional hazards) for predicting progression in high-grade glioma after proton and carbon ion radiotherapy.
      ,
      • Dean J.
      • Wong K.
      • Gay H.
      • Welsh L.
      • Jones A.-B.
      • Schick U.
      • et al.
      Incorporating spatial dose metrics in machine learning-based normal tissue complication probability (NTCP) models of severe acute dysphagia resulting from head and neck radiotherapy.
      ,
      • Luo Y.
      • Chen S.
      • Valdes G.
      Machine learning for radiation outcome modeling and prediction.
      ] and even for different datasets within the same application [
      • Deist T.M.
      • Dankers F.J.W.M.
      • Valdes G.
      • Wijsman R.
      • Hsu I.-C.
      • Oberije C.
      • et al.
      Machine learning algorithms for outcome prediction in (chemo)radiotherapy: an empirical comparison of classifiers.
      ,

      Differentiation of glioblastoma from solitary brain metastases using radiomic machine-learning classifiers. Cancer Lett 2019;451:128–35.

      ]. This makes it hard to conclude on the absolute superiority of RFs algorithm over other ML classifiers. Nevertheless, the work of Deist et al. included one of the largest datasets investigated so far for radiotherapy outcome prediction, which is a strong argument in favor of considering RFs as one of the first options to investigate for this kind of application. In addition, RFs keep achieving very promising results in recent applications related to outcome prediction [
      • Akcay M.
      • Etiz D.
      • Celik O.
      Prediction of survival and recurrence patterns by machine learning in gastric cancer cases undergoing radiation therapy and chemotherapy.
      ,
      • Luo Y.
      • Chen S.
      • Valdes G.
      Machine learning for radiation outcome modeling and prediction.
      ,
      • Luna J.M.
      • Chao H.-H.
      • Diffenderfer E.S.
      • Valdes G.
      • Chinniah C.
      • Ma G.
      • et al.
      Predicting radiation pneumonitis in locally advanced stage II–III non-small cell lung cancer using machine learning.
      ,
      • Valdes G.
      • Chang A.J.
      • Interian Y.
      • Owen K.
      • Jensen S.T.
      • Ungar L.H.
      • et al.
      Salvage HDR brachytherapy: multiple hypothesis testing versus machine learning analysis.
      ,
      • Meti N.
      • Saednia K.
      • Lagree A.
      • Tabbarah S.
      • Mohebpour M.
      • Kiss A.
      • et al.
      Machine learning frameworks to predict neoadjuvant chemotherapy response in breast cancer using clinical and pathological features.
      ,
      • Zhou G.-Q.
      • Wu C.-F.
      • Deng B.
      • Gao T.-S.
      • Lv J.-W.
      • Lin L.
      • et al.
      An optimal posttreatment surveillance strategy for cancer survivors based on an individualized risk-based approach.
      ], but also for other domains like image classification [
      • Yang F.
      • Chen W.
      • Wei H.
      • Zhang X.
      • Yuan S.
      • Qiao X.
      • et al.
      Machine learning for histologic subtype classification of non-small cell lung cancer: a retrospective multicenter radiomics study.
      ,
      • Novak J.
      • Zarinabad N.
      • Rose H.
      • Arvanitis T.
      • MacPherson L.
      • Pinkey B.
      • et al.
      Classification of paediatric brain tumours by diffusion weighted imaging and machine learning.
      ] or automatic treatment planning [
      • McIntosh C.
      • Welch M.
      • McNiven A.
      • Jaffray D.A.
      • Purdie T.G.
      Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.
      ,
      • Sheng Y.
      • Li T.
      • Yoo S.
      • Yin F.-F.
      • Blitzblau R.
      • Horton J.K.
      • et al.
      Automatic planning of whole breast radiation therapy using machine learning models.
      ,
      • McIntosh C.
      • Purdie T.G.
      Contextual atlas regression forests: multiple-atlas-based automated dose prediction in radiation therapy.
      ,
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      The importance of evaluating the complete automated knowledge-based planning pipeline.
      ]. Regarding other tasks where RFs were among the state-of-the-art methods a few years ago, like image synthesis [
      • Jog A.
      • Carass A.
      • Roy S.
      • Pham D.L.
      • Prince J.L.
      Random forest regression for magnetic resonance image synthesis.
      ,
      • Lei Y.
      • Harms J.
      • Wang T.
      • Tian S.
      • Zhou J.
      • Shu H.-K.
      • et al.
      MRI-based synthetic CT generation using semantic random forest with iterative refinement.
      ,
      • Yang X.
      • Lei Y.
      • Shu H.-K.
      • Rossi P.
      • Mao H.
      • Shim H.
      • et al.
      Pseudo CT estimation from MRI using patch-based random forest.
      ] or segmentation [
      • Polan D.F.
      • Brady S.L.
      • Kaufman R.A.
      Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study.
      ,
      • Gao Y.
      • Shao Y.
      • Lian J.
      • Wang A.Z.
      • Chen R.C.
      • Shen D.
      Accurate segmentation of CT male pelvic organs via regression-based deformable models and multi-task random forests.
      ], the community has now fully switched the attention to CNNs [
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Wynne J.F.
      • Curran W.J.
      • Liu T.
      • et al.
      A review on medical imaging synthesis using deep learning and its clinical applications.
      ,
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Curran W.J.
      • Liu T.
      • Nye J.A.
      • et al.
      Machine learning in quantitative PET: a review of attenuation correction and low-count image reconstruction methods.
      ,
      • Seo H.
      • Badiei Khuzani M.
      • Vasudevan V.
      • Huang C.
      • Ren H.
      • Xiao R.
      • et al.
      Machine learning techniques for biomedical image segmentation: an overview of technical aspects and introduction to state-of-art applications.
      ]. Nevertheless, in favor of RFs one could argue that they are easy to implement and less computationally expensive than CNNs (i.e., they can work in regular CPU). Therefore, they still deserve an important place in the ML toolbox for medical imaging.

      Convolutional neural networks (CNNs)

      Convolutional neural networks (CNNs) are inspired by the human visual system and exploit the spatial arrangement of data within images. Their remarkable capacity to detect hierarchical data representations has made CNNs the most popular architecture for current medical image processing applications.
      Traditionally, CNNs stack successive layers of convolutions and down-sampling, and fully connected layers towards the output (Fig. 9). Sequential applications of multiple convolutions enable the network to extract first simple features, like edges, in the deepest layers, which are next combined and refined into richer, more complex, hierarchical features, like full organs. Within each convolutional layer, feature saliency is determined by scanning a fixed-size convolution kernel (typically 3x3) all over the image to yield a feature map. This allows for an economy of parameters (weight sharing) and hence easier training. Downsampling layers are inserted between convolutional layers to reduce the size of feature maps, typically by applying a max-pooling operation, which keeps the maximum pixel value out of all non-overlapping 2-by-2 blocks in the feature map. To some extent, successive max-pooling allows for some shift invariance with respect to image content, as the salient maximum might stem from anywhere in the block. Downsampling trades resolution for number, as more convolution filters can be applied to smaller features maps within the same memory footprint. Eventually, fully connected layers generate the outputs, where all neurons are inter-connected.
      Figure thumbnail gr9
      Fig. 9Typical architecture for a (deep) Convolutional Neural Network (CNN). Different convolutional kernels scan the input images leading to several feature maps. Then, down-sampling operations, such as max-pooling (i.e., taking the maximum value of a block of pixels), are applied to reduce the size of the feature maps. These two operations, convolution and pooling, are applied multiple times to extract higher-level features. At the end, the feature maps are flattened and passed through fully connected layers of neurons (see ), to obtain a final prediction. The embedded (automatic and unsupervised) feature extraction () is what enables CNNs to remove all hand-crafted operations and makes them so powerful.
      Fully convolutional networks (FCNs) [

      Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR) 2015. https://doi.org/10.1109/cvpr.2015.7298965.

      ] were proposed to efficiently perform image-to-image tasks like segmentation. In CNNs, repeated convolution and max-pooling layers lead to low resolution abstract outputs. In order to return to full-resolution images, fully connected layers in CNNs are replaced in FCNs with operations that revert convolution and max-pooling. Following the same line, U-net [

      Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: medical image computing and computer-assisted intervention – MICCAI 2015, Springer International Publishing; 2015, p. 234–41.

      ] was presented for biomedical image segmentation, and is now widely used in medical imaging. It is an encoder-decoder styled network, where the encoder can be seen as a feature extraction block, and the decoder as output generation block. Within medical imaging, FCNs are used in both supervised, and unsupervised settings depending on the respective architecture. In supervised training, FCNs are mostly used for discriminative tasks, such as detection, localization, classification, segmentation, and denoising. Note that CNNs and FCNs are often used interchangeably.
      For certain applications, such as image segmentation [
      • Seo H.
      • Badiei Khuzani M.
      • Vasudevan V.
      • Huang C.
      • Ren H.
      • Xiao R.
      • et al.
      Machine learning techniques for biomedical image segmentation: an overview of technical aspects and introduction to state-of-art applications.
      ,
      • Lustberg T.
      • van Soest J.
      • Gooding M.
      • Peressutti D.
      • Aljabar P.
      • van der Stoep J.
      • et al.
      Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer.
      ,
      • Vrtovec T.
      • Močnik D.
      • Strojan P.
      • Pernuš F.
      • Ibragimov B.
      Auto-segmentation of organs at risk for head and neck radiotherapy planning: From atlas-based to deep learning methods.
      ] or synthesis [
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Wynne J.F.
      • Curran W.J.
      • Liu T.
      • et al.
      A review on medical imaging synthesis using deep learning and its clinical applications.
      ], CNNs are now considered the state-of-the-art methods [
      • Litjens G.
      • Kooi T.
      • Bejnordi B.E.
      • Setio A.A.A.
      • Ciompi F.
      • Ghafoorian M.
      • et al.
      A survey on deep learning in medical image analysis.
      ]. Although the comparison of different algorithms on the same dataset is not so common, an excellent way to track the evolution of the state-of-the-art algorithms is to look at the challenges and competitions organised around specific topics. In certain cases, CNNs have clearly surpassed the performance of more classical methods. A good example is the Challenge on Liver Ultrasound Tracking (CLUST): the winning team in the first edition (2014) achieved a tracking error of 1.51 ± 1.88 mm using an approach based on image registration algorithms [

      König L, Kipshagen T, Rühaak J. A non-linear image registration scheme for real-time liver ultrasound tracking using normalized gradient fields. In: Proc MICCAI CLUST14, Boston, USA 2014:29–36.

      ]; whereas the current best performing algorithm, based on CNNs, achieves under 1 mm accuracy (0.69 ± 0.67 mm), demonstrating a more robust model [
      • Liu F.
      • Liu D.
      • Tian J.
      • Xie X.
      • Yang X.
      • Wang K.
      Cascaded one-shot deformable convolutional neural networks: developing a deep learning model for respiratory motion estimation in ultrasound sequences.
      ]. Another example is the database from the MICCAI Head and Neck Auto-segmentation Challenge 2015 [
      • Raudaschl P.F.
      • Zaffino P.
      • Sharp G.C.
      • Spadea M.F.
      • Chen A.
      • Dawant B.M.
      • et al.
      Evaluation of segmentation methods on head and neck CT: auto-segmentation challenge 2015.
      ], for which the most recent methods based on CNNs [

      Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

      ,
      • Zhu W.
      • Huang Y.
      • Zeng L.
      • Chen X.
      • Liu Y.
      • Qian Z.
      • et al.
      AnatomyNet: deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy.
      ,
      • Gou S.
      • Tong N.
      • Qi S.
      • Yang S.
      • Chin R.
      • Sheng K.
      Self-channel-and-spatial-attention neural network for automated multi-organ segmentation on head and neck CT images.
      ,
      • Liang S.
      • Thung K.-H.
      • Nie D.
      • Zhang Y.
      • Shen D.
      Multi-view spatial aggregation framework for joint localization and segmentation of organs at risk in head and neck CT images.
      ] has improved the Dice coefficients obtained at that time with model- and atlas-based algorithms by more than 3% on average. In particular, the work of Nikolov et al. [

      Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

      ] has recently reported a U-Net architecture with an accuracy equivalent to experienced radiographers. These are just two of the many competitions organised around medical imaging tasks [

      Grand Challenge n.d. https://grand-challenge.org/ (accessed February 17, 2021).

      ], but year after year CNNs are becoming the backbone of the best performing algorithms.
      Some of the latest methodological improvements in the architecture of CNNs that have contributed to more robust and accurate models include coarse-to-fine cascade of two CNNs [
      • Gerard S.E.
      • Patton T.J.
      • Christensen G.E.
      • Bayouth J.E.
      • Reinhardt J.M.
      FissureNet: a deep learning approach for pulmonary fissure detection in CT images.
      ] to address class-imbalance issues; the addition of squeeze-and-excitation (SE)-blocks to allow the network to model the channel and spatial information separately [
      • Liu P.
      • Dou Q.
      • Wang Q.
      • Heng P.-A.
      An encoder-decoder neural network with 3D squeeze-and-excitation and deep supervision for brain tumor segmentation.
      ], increasing the model capacity; or the implementation of attention mechanisms, which enables the network to focus only on most relevant features [
      • Hu H.
      • Li Q.
      • Zhao Y.
      • Zhang Y.
      Parallel deep learning algorithms with hybrid attention mechanism for image segmentation of lung tumors.
      ,
      • Zhou T.
      • Ruan S.
      • Guo Y.
      • Canu S.
      A multi-modality fusion network based on attention mechanism for brain tumor segmentation.
      ,
      • Schlemper J.
      • Oktay O.
      • Schaap M.
      • Heinrich M.
      • Kainz B.
      • Glocker B.
      • et al.
      Attention gated networks: learning to leverage salient regions in medical images.
      ].
      Besides image segmentation, other recent successful applications of CNNs include classification [
      • Jiang H.
      • Gao F.
      • Xu X.
      • Huang F.
      • Zhu S.
      Attentive and ensemble 3D dual path networks for pulmonary nodules classification.
      ,
      • McKinney S.M.
      • Sieniek M.
      • Godbole V.
      • Godwin J.
      • Antropova N.
      • Ashrafian H.
      • et al.
      International evaluation of an AI system for breast cancer screening.
      ], outcome prediction [
      • Xu Y.
      • Hosny A.
      • Zeleznik R.
      • Parmar C.
      • Coroller T.
      • Franco I.
      • et al.
      Deep learning predicts lung cancer treatment response from serial medical imaging.
      ,
      • Ibragimov B.
      • Toesca D.
      • Chang D.
      • Yuan Y.
      • Koong A.
      • Xing L.
      Development of deep neural network for individualized hepatobiliary toxicity prediction after liver SBRT.
      ,
      • Sekaran K.
      • Chandana P.
      • Murali Krishna N.
      • Kadry S.
      Deep learning convolutional neural network (CNN) With Gaussian mixture model for predicting pancreatic cancer.
      ], automatic treatment planning [
      • Kandalan R.N.
      • Nguyen D.
      • Rezaeian N.H.
      • Barragán-Montero A.M.
      • Breedveld S.
      • Namuduri K.
      • et al.
      Dose prediction with deep learning for prostate cancer radiation therapy: model adaptation to different treatment planning practices.
      ,
      • Nguyen D.
      • Jia X.
      • Sher D.
      • Lin M.-H.
      • Iqbal Z.
      • Liu H.
      • et al.
      3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture.
      ,
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ], motion tracking [
      • Roggen T.
      • Bobic M.
      • Givehchi N.
      • Scheib S.G.
      Deep learning model for markerless tracking in spinal SBRT.
      ,
      • Mori S.
      • Hirai R.
      • Sakata Y.
      Simulated four-dimensional CT for markerless tumor tracking using a deep learning network with multi-task learning.
      ] or image enhancement [
      • Javaid U.
      • Souris K.
      • Dasnoy D.
      • Huang S.
      • Lee J.A.
      Mitigating inherent noise in Monte Carlo dose distributions using dilated U-Net.
      ]. In numerous applications, CNNs have either demonstrated an accuracy similar to human experts [

      Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

      ,
      • Becker A.S.
      • Mueller M.
      • Stoffel E.
      • Marcon M.
      • Ghafoor S.
      • Boss A.
      Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study.
      ,
      • Schreier J.
      • Genghi A.
      • Laaksonen H.
      • Morgas T.
      • Haas B.
      Clinical evaluation of a full-image deep segmentation algorithm for the male pelvis on cone-beam CT and CT.
      ,
      • Wong J.
      • Fong A.
      • McVicar N.
      • Smith S.
      • Giambattista J.
      • Wells D.
      • et al.
      Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning.
      ], decreased the interobserver variability [
      • Gatos I.
      • Tsantis S.
      • Spiliopoulos S.
      • Karnabatidis D.
      • Theotokas I.
      • Zoumpoulis P.
      • et al.
      Temporal stability assessment in shear wave elasticity images validated by deep learning neural network for chronic liver disease fibrosis stage assessment.
      ,
      • Kagadis G.C.
      • Drazinos P.
      • Gatos I.
      • Tsantis S.
      • Papadimitroulas P.
      • Spiliopoulos S.
      • et al.
      Deep learning networks on chronic liver disease assessment with fine-tuning of shear wave elastography image sequences.
      ] or reduced the physician’s workload for a specific task [
      • Lustberg T.
      • van Soest J.
      • Gooding M.
      • Peressutti D.
      • Aljabar P.
      • van der Stoep J.
      • et al.
      Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer.
      ]

      Generative adversarial networks (GANs)

      Generative adversarial networks (GANs) [
      • Goodfellow I.
      • Pouget-Abadie J.
      • Mirza M.
      • Xu B.
      • Warde-Farley D.
      • Ozair S.
      • et al.
      Generative adversarial networks.
      ] are popular architectures used for generative modeling. GANs consists of two networks: generator ℊand discriminator (Fig. 10). The intuition is that ℊiteratively tries to map a random input distribution to a given data distribution to generate new data, which D evaluates. Depending on the feedback from D, ℊ tends to minimize the loss between the two distributions, thus generating similar samples as input data. The goal is to trick D into classifying generated data as real. Both networks are trained simultaneously to get better at their respective tasks: while ℊ is learning to fool D, D is concurrently learning to better distinguish generated data from real input data. Note that both D and ℊare generally CNNs trained in an adversarial setup.
      Figure thumbnail gr10
      Fig. 10Structure of Generative Adversarial Networks (GANs). Starting from random noise, the generator (G) uses the feedback from the discriminator (D) and learns to create images that are similar to the provided ground truth.
      Unlike CNNs, which have relatively old foundations dating back to 1980 (Section 2), adversarial learning is a rather new concept. However, it has rapidly rooted in the medical imaging field, leading to numerous publications in the last few years [
      • Yi X.
      • Walia E.
      • Babyn P.
      Generative adversarial network in medical imaging: a review.
      ,
      • Lan L.
      • You L.
      • Zhang Z.
      • Fan Z.
      • Zhao W.
      • Zeng N.
      • et al.
      Generative adversarial networks and its applications in biomedical informatics.
      ]. The initially proposed architecture for GANs [
      • Goodfellow I.
      • Pouget-Abadie J.
      • Mirza M.
      • Xu B.
      • Warde-Farley D.
      • Ozair S.
      • et al.
      Generative adversarial networks.
      ] suffered from several drawbacks, such as a very unstable training, but the intensive research in the field of computer vision has lead substantial improvements by either changing the architecture of D and ℊ, or investigating new loss functions [
      • Yi X.
      • Walia E.
      • Babyn P.
      Generative adversarial network in medical imaging: a review.
      ,

      Gonog L, Zhou Y. A Review: generative adversarial networks. In: 2019 14th IEEE conference on industrial electronics and applications (ICIEA), 2019, p. 505–10.

      ]. A way to better control the data generation process in GANs is to provide extra information about the desired properties of the output (e.g. examples of the desired real images or labels). This is known as conditional GANs (cGANs) [
      • Isola P.
      • Zhu J.-Y.
      • Zhou T.
      • Efros A.A.
      Image-to-image translation with conditional adversarial networks.
      ], and it can be categorized as a form of supervised learning since it requires aligned training pairs. However, we believe that the real strength of GANs relies on their ability to learn in a semi-supervised or fully unsupervised manner. Specifically, in the medical imaging field, where aligned and properly annotated image pairs are seldom available, GANs are starting to play a very important role. In this context, cycleGANs [
      • Zhu J.-Y.
      • Park T.
      • Isola P.
      • Efros A.A.
      Unpaired image-to-image translation using cycle-consistent adversarial networks.
      ] is probably one of the most famous architectures allowing bidirectional mapping between two domains using unpaired input data.
      So far, in the medical imaging field, GANs have been mostly applied to synthetic image generation for data augmentation [
      • Sandfort V.
      • Yan K.
      • Pickhardt P.J.
      • Summers R.M.
      Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks.
      ,
      • Shao S.
      • Wang P.
      • Yan R.
      Generative adversarial networks for data augmentation in machine fault diagnosis.
      ,
      • Shorten C.
      • Khoshgoftaar T.M.
      A survey on image data augmentation for deep learning.
      ] and multi-modality image translation (e.g. MR to CT [
      • Kazemifar S.
      • McGuire S.
      • Timmerman R.
      • Wardak Z.
      • Nguyen D.
      • Park Y.
      • et al.
      MRI-only brain radiotherapy: assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach.
      ,
      • Maspero M.
      • Savenije M.H.F.
      • Dinkla A.M.
      • Seevinck P.R.
      • Intven M.P.W.
      • Jurgenliemk-Schulz I.M.
      • et al.
      Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy.
      ,
      • Wolterink J.M.
      • Dinkla A.M.
      • Savenije M.H.F.
      • Seevinck P.R.
      • van den Berg C.A.T.
      • Išgum I.
      Deep MR to CT synthesis using unpaired data.
      ,
      • Kazemifar S.
      • Barragán Montero A.M.
      • Souris K.
      • Rivas S.T.
      • Timmerman R.
      • Park Y.K.
      • et al.
      Dosimetric evaluation of synthetic CT generated with GANs for MRI-only proton therapy treatment planning of brain tumors.
      ], CBCT to CT [
      • Liang X.
      • Chen L.
      • Nguyen D.
      • Zhou Z.
      • Gu X.
      • Yang M.
      • et al.
      Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy.
      ,
      • Kurz C.
      • Maspero M.
      • Savenije M.H.F.
      • Landry G.
      • Kamp F.
      • Pinto M.
      • et al.
      CBCT correction using a cycle-consistent generative adversarial network and unpaired training to enable photon and proton dose calculation.
      ], among others [
      • Wang T.
      • Lei Y.
      • Fu Y.
      • Wynne J.F.
      • Curran W.J.
      • Liu T.
      • et al.
      A review on medical imaging synthesis using deep learning and its clinical applications.
      ,
      • Yi X.
      • Walia E.
      • Babyn P.
      Generative adversarial network in medical imaging: a review.
      ]). Regarding data augmentation applications, we believe that GAN-based models have the potential to better sample the whole data distribution and generate more realistic images than traditional approaches (e.g. rotation, flipping, etc), which may contribute to a higher generalizability of the model [
      • Sandfort V.
      • Yan K.
      • Pickhardt P.J.
      • Summers R.M.
      Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks.
      ] and more efficient training [

      Bowles C, Chen L, Guerrero R, Bentley P, Gunn R, Hammers A, et al. GAN augmentation: augmenting training data using generative adversarial networks 2018.

      ]. For multi-modality image translation, although cGANs have achieved good results [
      • Kazemifar S.
      • McGuire S.
      • Timmerman R.
      • Wardak Z.
      • Nguyen D.
      • Park Y.
      • et al.
      MRI-only brain radiotherapy: assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach.
      ,
      • Maspero M.
      • Savenije M.H.F.
      • Dinkla A.M.
      • Seevinck P.R.
      • Intven M.P.W.
      • Jurgenliemk-Schulz I.M.
      • et al.
      Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy.
      ,
      • Kazemifar S.
      • Barragán Montero A.M.
      • Souris K.
      • Rivas S.T.
      • Timmerman R.
      • Park Y.K.
      • et al.
      Dosimetric evaluation of synthetic CT generated with GANs for MRI-only proton therapy treatment planning of brain tumors.
      ,
      • Qi M.
      • Li Y.
      • Wu A.
      • Jia Q.
      • Li B.
      • Sun W.
      • et al.
      Multi-sequence MR image-based synthetic CT generation using a generative adversarial network for head and neck MRI-only radiotherapy.
      ], cycleGANs usually outperforms in terms of accuracy, in addition to overcome the issues related to paired image training (i.e. inaccurate aligning or labelling) [
      • Liu Y.
      • Lei Y.
      • Wang T.
      • Fu Y.
      • Tang X.
      • Curran W.J.
      • et al.
      CBCT-based synthetic CT generation using deep-attention cycleGAN for pancreatic adaptive radiotherapy.
      ,
      • Liang X.
      • Chen L.
      • Nguyen D.
      • Zhou Z.
      • Gu X.
      • Yang M.
      • et al.
      Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy.
      ,
      • Kurz C.
      • Maspero M.
      • Savenije M.H.F.
      • Landry G.
      • Kamp F.
      • Pinto M.
      • et al.
      CBCT correction using a cycle-consistent generative adversarial network and unpaired training to enable photon and proton dose calculation.
      ,
      • Liu Y.
      • Lei Y.
      • Wang Y.
      • Shafai-Erfani G.
      • Wang T.
      • Tian S.
      • et al.
      Evaluation of a deep learning-based pelvic synthetic CT generation technique for MRI-based prostate proton treatment planning.
      ]. Besides image translation, GANs have also been applied to other tasks, such as segmentation [
      • Wang Y.
      • Wang S.
      • Chen J.
      • Wu C.
      Whole mammographic mass segmentation using attention mechanism and multiscale pooling adversarial network.
      ,
      • Rezaei M.
      • Yang H.
      • Meinel C.
      Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation.
      ,
      • Khosravan N.
      • Mortazi A.
      • Wallace M.
      • Bagci U.
      PAN: projective adversarial network for medical image segmentation.
      ,
      • Jia H.
      • Xia Y.
      • Song Y.
      • Zhang D.
      • Huang H.
      • Zhang Y.
      • et al.
      3D APA-Net: 3D adversarial pyramid anisotropic convolutional network for prostate segmentation in MR images.
      ,
      • Huo Y.
      • Xu Z.
      • Bao S.
      • Bermudez C.
      • Plassard A.J.
      • Liu J.
      • et al.
      Splenomegaly segmentation using global convolutional kernels and conditional generative adversarial networks.
      ,
      • Dong X.
      • Lei Y.
      • Wang T.
      • Thomas M.
      • Tang L.
      • Curran W.J.
      • et al.
      Automatic multiorgan segmentation in thorax CT images using U-net-GAN.
      ] or radiotherapy dose prediction [
      • Kearney V.
      • Chan J.W.
      • Wang T.
      • Perry A.
      • Descovich M.
      • Morin O.
      • et al.
      DoseGAN: a generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation.
      ,
      • Babier A.
      • Mahmood R.
      • McNiven A.L.
      • Diamant A.
      • Chan T.C.Y.
      Knowledge-based automated planning with three-dimensional generative adversarial networks.
      ,
      • Murakami Y.
      • Magome T.
      • Matsumoto K.
      • Sato T.
      • Yoshioka Y.
      • Oguchi M.
      Fully automated dose prediction using generative adversarial networks in prostate cancer patients.
      ,

      Mahmood R, Babier A, McNiven A, Diamant A, Chan TCY. Automated treatment planning in radiation therapy using generative adversarial networks. In: Doshi-Velez F, Fackler J, Jung K, Kale D, Ranganath R, Wallace B, et al., editors. Proceedings of the 3rd machine learning for healthcare conference, vol. 85, Palo Alto, California: PMLR; 2018, p. 484–99.

      ], or artifact reduction [
      • Koike Y.
      • Anetai Y.
      • Takegawa H.
      • Ohira S.
      • Nakamura S.
      • Tanigawa N.
      Deep learning-based metal artifact reduction using cycle-consistent adversarial network for intensity-modulated head and neck radiation therapy treatment planning.
      ], among others [
      • Yi X.
      • Walia E.
      • Babyn P.
      Generative adversarial network in medical imaging: a review.
      ].
      All above-mentioned applications have explored the generative capacity of GANs, but we believe that their discriminate capacity may also have some potential, since it can be used as regularized or detector when provided with abnormal images [
      • Schlegl T.
      • Seeböck P.
      • Waldstein S.M.
      • Schmidt-Erfurth U.
      • Langs G.
      Unsupervised anomaly detection with generative adversarial networks to guide marker discovery.
      ], which might be an excellent application for quality assurance tasks in radiation oncology, for instance.

      Discussion and concluding remarks: Where do we go next?

      This article provided an overview of AI with a focus on medical imaging analysis, paying attention to key methodological concepts and highlighting the potential of the state-of-the-art ML and DL methods to automate and improve different steps of the clinical practice. Incorporating such knowledge into the clinical practice and making it accessible to the medical community will definitely help to demystify this technology, inspire new and high quality research directions, and facilitate the adoption of AI methods in the clinical environment.
      Looking at the evolution of AI methods, one can certainly conclude that shifting from computationalism to connectionism, together with the transition from shallow to deep architectures, has brought a disruptive transformation to the medical field. However, an important part of the research so far has focused on simply translating the latest ML/DL advances in the field of computer vision to medical applications, in order to demonstrate the potential of these methods and the feasibility to use them to improve the clinical practice. It is the case of some of the papers cited in this manuscript, such as the first proof-of-concepts of the use of CNNs for organ segmentation [
      • Ibragimov B.
      • Xing L.
      Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks.
      ] and for dose prediction for radiotherapy treatments [
      • Nguyen D.
      • Long T.
      • Jia X.
      • Lu W.
      • Gu X.
      • Iqbal Z.
      • et al.
      A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning.
      ], or the use of GANs for conversion between image modalities [
      • Liang X.
      • Chen L.
      • Nguyen D.
      • Zhou Z.
      • Gu X.
      • Yang M.
      • et al.
      Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy.
      ]. Although the technological transfer from computer science to the medical field will certainly continue to bring important progress, the next generation of AI methods for medical applications will only emerge if the medical community steps up to embrace AI technology and integrate all the domain-specific knowledge into the state-of-the-art AI methods [
      • Xie X.
      • Niu J.
      • Liu X.
      • Chen Z.
      • Tang S.
      • Yu S.
      A survey on incorporating domain knowledge into deep learning for medical image analysis.
      ,

      Deng C, Ji X, Rainey C, Zhang J, Lu W. Integrating machine learning with human knowledge. iScience 2020;23:101656.

      ]. This can be done in several ways, such as adding extra information in the input channels of the models or using dedicated loss functions during the model training. Some groups have already started to explore these research directions. For instance, instead of using generic loss functions from computer vision tasks, like the mean squared error, one could use loss functions that better target the specificities of our medical problem, such as including mutual information for the conversion of different image modalities [
      • Kazemifar S.
      • McGuire S.
      • Timmerman R.
      • Wardak Z.
      • Nguyen D.
      • Park Y.
      • et al.
      MRI-only brain radiotherapy: assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach.
      ,
      • Kazemifar S.
      • Barragán Montero A.M.
      • Souris K.
      • Rivas S.T.
      • Timmerman R.
      • Park Y.K.
      • et al.
      Dosimetric evaluation of synthetic CT generated with GANs for MRI-only proton therapy treatment planning of brain tumors.
      ] or dose-volume histograms for radiotherapy dose predictions [
      • Nguyen D.
      • McBeth R.
      • Sadeghnejad Barkousaraie A.
      • Bohara G.
      • Shen C.
      • Jia X.
      • et al.
      Incorporating human and learned domain knowledge into training deep neural networks: a differentiable dose-volume histogram and adversarial inspired framework for generating Pareto optimal dose distributions in radiation therapy.
      ]. Regarding the injection of domain-specific knowledge as input to the models, some examples include the addition of electronic health records and clinical data, like text and laboratory results, to the image data [
      • Zhen S.-H.
      • Cheng M.
      • Tao Y.-B.
      • Wang Y.-F.
      • Juengpanich S.
      • Jiang Z.-Y.
      • et al.
      Deep learning for accurate diagnosis of liver tumor based on magnetic resonance imaging and clinical data.
      ,
      • Shehata M.
      • Shalaby A.
      • Switala A.E.
      • El-Baz M.
      • Ghazal M.
      • Fraiwan L.
      • et al.
      A multimodal computer-aided diagnostic system for precise identification of renal allograft rejection: preliminary results.
      ,
      • Huang S.-C.
      • Pareek A.
      • Seyyedi S.
      • Banerjee I.
      • Lungren M.P.
      Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines.
      ], or having first-order prior or approximations of the expected output [
      • Barragán-Montero A.M.
      • Nguyen D.
      • Lu W.
      • Lin M.-H.
      • Norouzi-Kandalan R.
      • Geets X.
      • et al.
      Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations.
      ,
      • Hu J.
      • Song Y.
      • Wang Q.
      • Bai S.
      • Yi Z.
      Incorporating historical sub-optimal deep neural networks for dose prediction in radiotherapy.
      ,
      • Xing Y.
      • Zhang Y.
      • Nguyen D.
      • Lin M.-H.
      • Lu W.
      • Jiang S.
      Boosting radiotherapy dose calculation accuracy with deep learning.
      ,
      • Kontaxis C.
      • Bol G.H.
      • Lagendijk J.J.W.
      • Raaymakers B.W.
      DeepDose: towards a fast dose calculation engine for radiation therapy using deep learning.
      ,
      • Muralidhar N.
      • Islam M.R.
      • Marwah M.
      • Karpatne A.
      • Ramakrishnan N.
      Incorporating prior domain knowledge into deep neural networks.
      ]
      Integrating domain-specific knowledge cannot only serve to improve the performances of state-of-the-art AI models, but also to increase the interpretability of the results, which is one of the well-acknowledged limitations of the current ML/DL methods [
      • Luo Y.
      • Tseng H.-H.
      • Cui S.
      • Wei L.
      • Ten Haken R.K.
      • El Naqa I.
      Balancing accuracy and interpretability of machine learning approaches for radiation treatment outcomes modeling.
      ,
      • Reyes M.
      • Meier R.
      • Pereira S.
      • Silva C.A.
      • Dahlweid F.-M.
      • von Tengg-Kobligk H.
      • et al.
      On the interpretability of artificial intelligence in radiology: challenges and opportunities.
      ,
      • Murdoch W.J.
      • James Murdoch W.
      • Singh C.
      • Kumbier K.
      • Abbasi-Asl R.
      • Yu B.
      Definitions, methods, and applications in interpretable machine learning.
      ,
      • Barredo Arrieta A.
      • Díaz-Rodríguez N.
      • Del Ser J.
      • Bennetot A.
      • Tabik S.
      • Barbado A.
      • et al.
      Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI.
      ]. This is the idea behind the so-called Expert Augmented Machine Learning (EAML), whose goal is to develop algorithms capable of extracting human knowledge from a panel of experts and use it to establish constraints for the model’s prediction [
      • Gennatas E.D.
      • Friedman J.H.
      • Ungar L.H.
      • Pirracchio R.
      • Eaton E.
      • Reichmann L.G.
      • et al.
      Expert-augmented machine learning.
      ]. This “human-in-the-loop” approach is also useful to train our AI models more efficiently. Indeed, some preliminary studies have reported that blindly increasing the training databases will not bring much improvement to our AI model’s performance [
      • Barragán-Montero A.M.
      • Thomas M.
      • Defraene G.
      • Michiels S.
      • Haustermans K.
      • Lee J.A.
      • et al.
      Deep learning dose prediction for IMRT of esophageal cancer: the effect of data quality and quantity on model performance.
      ]. In contrast, active learning [
      • Blanch M.G.
      ] is a type of iterative supervised learning that follows this human-in-the-loop concept, where the algorithm itself query the user to obtain new data points where they are most needed, in order to build up an optimally balanced training dataset [
      • Smailagic A.
      • Costa P.
      • Noh H.Y.
      • Walawalkar D.
      • Khandelwal K.
      • Galdran A.
      • et al.
      MedAL: accurate and robust deep active learning for medical image analysis.
      ,
      • Kim T.
      • Lee K.
      • Ham S.
      • Park B.
      • Lee S.
      • Hong D.
      • et al.
      Active learning for accuracy enhancement of semantic segmentation with CNN-corrected label curations: evaluation on kidney segmentation in abdominal CT.
      ]. Nevertheless, although a human-centered approach for AI models is certainly the way to go in the close future, parallel research should focus on implementing strategies that leverage the problem of data labelling with semi-supervised, unsupervised or yet the increasingly popular self-supervised learning.
      The quality of the data itself is certainly another important aspect that is worth discussion. Data collection and curation are indeed of paramount importance, since errors, biases, or variability in the training databases are often directly reflected in the model behavior and can have dramatic consequences in the model performances and its clinical outcome. Some examples of these issues include gender imbalance [
      • Larrazabal A.J.
      • Nieto N.
      • Peterson V.
      • Milone D.H.
      • Ferrante E.
      Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.
      ], racial bias [

      Z O, Obermeyer Z, Powers B, C et al V. Dissecting racial bias in an algorithm used to manage the health of populations. In: Yearbook of paediatric endocrinology 2020. https://doi.org/10.1530/ey.17.12.7.

      ], or data heterogeneity due to changes in treatment protocols over time [
      • Barragán-Montero A.M.
      • Thomas M.
      • Defraene G.
      • Michiels S.
      • Haustermans K.
      • Lee J.A.
      • et al.
      Deep learning dose prediction for IMRT of esophageal cancer: the effect of data quality and quantity on model performance.
      ]. Despite progress in AI methods, data collection remains poorly automatized and the time dedicated to data collection and curation is often overly long. In fact, most state-of-the-art AI algorithms can be trained in a few hours, whereas building a large-scale well curated database can take months. Therefore, the same way physicians are familiar with planning protocols or delineation guidelines, the clinical teams should start being familiar with guiding principles for data management and curation in the era of AI. The FAIR (Findability, Accessibility, Interoperability, and Reusability) Data Principles [
      • Wilkinson M.D.
      • Dumontier M.
      • Aalbersberg I.J.J.
      • Appleton G.
      • Axton M.
      • Baak A.
      • et al.
      The FAIR guiding principles for scientific data management and stewardship.
      ] are the most popular and general ones, but the medical community should focus efforts on adapting those principles to the specificities of the medical domain [
      • Kohli M.D.
      • Summers R.M.
      • Raymond Geis J.
      Medical image data and datasets in the era of machine learning—whitepaper from the 2016 C-MIMI meeting dataset session.
      ,
      • Harvey H.
      • Glocker B.
      A standardised approach for preparing imaging data for machine learning tasks in radiology.
      ,
      • Kortesniemi M.
      • Tsapaki V.
      • Trianni A.
      • Russo P.
      • Maas A.
      • Källman H.-E.
      • et al.
      The European Federation of Organisations for Medical Physics (EFOMP) White Paper: big data and deep learning in medical imaging and in relation to medical physics profession.
      ]. Only in this way, we will manage to have a safe and efficient clinical implementation of AI methods. In addition, federated learning approaches [
      • Sheller M.J.
      • Anthony Reina G.
      • Edwards B.
      • Martin J.
      • Bakas S.
      Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation.
      ,
      • Sheller M.J.
      • Edwards B.
      • Anthony Reina G.
      • Martin J.
      • Pati S.
      • Kotrotsou A.
      • et al.
      Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data.
      ,
      • Chang K.
      • Balachandar N.
      • Lam C.
      • Yi D.
      • Brown J.
      • Beers A.
      • et al.
      Distributed deep learning networks among institutions for medical imaging.
      ] can be used to train AI models across institutions while ensuring data privacy protection, sharing the clinical knowledge and getting the advantages of collaborative AI solutions.
      Investing time and collaborative effort in high-quality databases is certainly the way to move forward. So far, two aspects have played an important role in the recent development of AI and ML, namely, data repositories [
      • Prior F.
      • Almeida J.
      • Kathiravelu P.
      • Kurc T.
      • Smith K.
      • Fitzgerald T.J.
      • et al.
      Open access image repositories: high-quality data to enable machine learning research.
      ] and contests [

      Challenges n.d. https://grand-challenge.org/challenges/ (accessed December 3, 2020).

      ,
      • Prevedello L.M.
      • Halabi S.S.
      • Shih G.
      • Wu C.C.
      • Kohli M.D.
      • Chokshi F.H.
      • et al.
      Challenges related to artificial intelligence research in medical imaging and the importance of image analysis competitions.
      ], the former feeding the latter. Competitions generate emulation among actors of the domain and allow state-of-the-art models to be benchmarked. A few examples have been cited in this manuscript, but there are multiple competitions every year that lead to public data repositories [
      • Diaz O.
      • Kushibar K.
      • Osuala R.
      • Linardos A.
      • Garrucho L.
      • Igual L.
      • et al.
      Data preparation for artificial intelligence in medical imaging: a comprehensive guide to open-access platforms and tools.
      ,
      • Menze B.H.
      • Jakab A.
      • Bauer S.
      • Kalpathy-Cramer J.
      • Farahani K.
      • Kirby J.
      • et al.
      The multimodal brain tumor image segmentation benchmark (BRATS).
      ,

      Babier A, Zhang B, Mahmood R, Moore KL, Purdie TG, McNiven AL, et al. OpenKBP: The open-access knowledge-based planning grand challenge. arXiv [physics.med-Ph] 2020.

      ]. A very recent example is the breakthrough of AI in the CASP competition [
      • Callaway E.
      “It will change everything”: DeepMind’s AI makes gigantic leap in solving protein structures.
      ]. However, the results and rankings from competitions must be interpreted carefully when transferring the acquired knowledge into clinical applications [
      • Maier-Hein L.
      • Eisenmann M.
      • Reinke A.
      • Onogur S.
      • Stankovic M.
      • Scholz P.
      • et al.
      Why rankings of biomedical image analysis competitions should be interpreted with care.
      ,
      • Reinke A.
      • Eisenmann M.
      • Onogur S.
      • Stankovic M.
      • Scholz P.
      • Full P.M.
      • et al.
      How to exploit weaknesses in biomedical challenge design and organization.
      ]. Due to the high stakes in the medical domain, the community should devote even stronger efforts and international organizations should emit recommendations for data collection and curation, as well as for the design of contests and competitions. In the long term, this would lead to a much more structured and uniform clinical practice, with reduced differences between centres. Bigger, more homogeneous data could then potentially allow for another level of AI, by extracting much finer information, at the level of large populations. If data for AI is still in its infancy, so are also the methods. In spite of amazing progress and wowing results, current AI remains cast within tight frameworks. AI for images has been dealt here. Other application domains focus on natural language and speech processing with related but quite different approaches. So far, computer vision and audition follow different specialized approaches, although some works attempt to bridge the gap, like automatic image captioning. Nevertheless, these blocks remain mostly separated and current AI blocks lack integration and are still considered as weak AI. For instance, typical CNNs can boil down to just big filter banks, without any notion of time, and thus no memory and no experience. Strong AI is going to emerge when AI for images and speech, as well as active learning [
      • Smailagic A.
      • Costa P.
      • Noh H.Y.
      • Walawalkar D.
      • Khandelwal K.
      • Galdran A.
      • et al.
      MedAL: accurate and robust deep active learning for medical image analysis.
      ,
      • Kim T.
      • Lee K.
      • Ham S.
      • Park B.
      • Lee S.
      • Hong D.
      • et al.
      Active learning for accuracy enhancement of semantic segmentation with CNN-corrected label curations: evaluation on kidney segmentation in abdominal CT.
      ], will be combined into a sort of Frankensteinian brain, in which specialized lobes for the different senses get interconnected. This will allow for richer interaction, explainability through speech, reference to past experience, and continuous improvement. Such a strategy has been paved for autonomous driving, with different levels of automation. Confidence into ever more complex AI will grow only if AI can get more anthropomorphic, at least from a functional point of view.
      In conclusion, artificial intelligence methods, and in particular, machine and deep learning methods, have reached important milestones in the last few years, demonstrating their potential to improve and automate the medical practice. However, a safe and full integration of these methods into the clinical workflow still requires a multidisciplinary effort (computer science, IT, medical experts, …) to enable the next generation of strong AI methods, ensuring robust and interpretable AI-based solutions.

      Acknowledgements

      Ana Barragán is funded by the Walloon region in Belgium (PROTHERWAL/CHARP, grant 7289). Gilmer Valdés was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number K08EB026500. Dan Nguyen is supported by the National Institutes of Health (NIH) R01CA237269 and the Cancer Prevention & Research Institute of Texas (CPRIT) IIRA RP150485. Liesbeth Vandewinckele is supported by a Ph.D. fellowship of the research foundation - Flanders (FWO), mandate 1SA6121N. Kevin Souris is funded by the Walloon region (MECATECH / BIOWIN, grant 8090). John A. Lee is a Senior Research Associate with the F.R.S.-FNRS.

      References

        • Singh R.
        • Wu W.
        • Wang G.
        • Kalra M.K.
        Artificial intelligence in image reconstruction: the change is here.
        Phys Med. 2020; 79: 113-125
        • Wang M.
        • Zhang Q.
        • Lam S.
        • Cai J.
        • Yang R.
        A review on application of deep learning algorithms in external beam radiotherapy automated treatment planning.
        Front Oncol. 2020; 10: 580919
      1. Wang C, Zhu X, Hong JC, Zheng D. Artificial intelligence in radiotherapy treatment planning: present and future. Technol Cancer Res Treat 2019;18:1533033819873922.

        • Litjens G.
        • Kooi T.
        • Bejnordi B.E.
        • Setio A.A.A.
        • Ciompi F.
        • Ghafoorian M.
        • et al.
        A survey on deep learning in medical image analysis.
        Med Image Anal. 2017; 42: 60-88https://doi.org/10.1016/j.media.2017.07.005
        • Wang T.
        • Lei Y.
        • Fu Y.
        • Wynne J.F.
        • Curran W.J.
        • Liu T.
        • et al.
        A review on medical imaging synthesis using deep learning and its clinical applications.
        J Appl Clin Med Phys. 2021; 22: 11-36
        • Thompson R.F.
        • Valdes G.
        • Fuller C.D.
        • Carpenter C.M.
        • Morin O.
        • Aneja S.
        • et al.
        Artificial intelligence in radiation oncology: a specialty-wide disruptive transformation?.
        Radiother Oncol. 2018; 129: 421-426https://doi.org/10.1016/j.radonc.2018.05.030
        • Hosny A.
        • Parmar C.
        • Quackenbush J.
        • Schwartz L.H.
        • Aerts H.J.W.L.
        Artificial intelligence in radiology.
        Nat Rev Cancer. 2018; 18: 500-510
        • Morra L.
        • Delsanto S.
        • Correale L.
        Artificial Intelligence in Medical Imaging. 2019; https://doi.org/10.1201/9780367229184
      2. Ranschaert ER, Morozov S, Algra PR, editors. Artificial intelligence in medical imaging: opportunities, applications and risks. Springer, Cham; 2019.

      3. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. vol. 1. Springer series in statistics. New York; 2001.

        • Kuhn M.
        • Johnson K.
        Applied predictive modeling.
        Springer, New York, NY2013
        • Shen C.
        • Nguyen D.
        • Zhou Z.
        • Jiang S.B.
        • Dong B.
        • Jia X.
        An introduction to deep learning in medical physics: advantages, potential, and challenges.
        Phys Med Biol. 2020;
        • Cui S.
        • Tseng H.
        • Pakela J.
        • Ten Haken R.K.
        • El Naqa I.
        Introduction to machine and deep learning for medical physicists.
        Med Phys. 2020; https://doi.org/10.1002/mp.14140
        • Holman J.G.
        • Cookson M.J.
        Expert systems for medical applications.
        J Med Eng Technol. 1987; 11: 151-159
        • Haug P.J.
        Uses of diagnostic expert systems in clinical care.
        Proc Annu Symp Comput Appl Med Care. 1993; : 379-383
        • Miller R.A.
        Medical diagnostic decision support systems–past, present, and future: a threaded bibliography and brief commentary.
        J Am Med Inform Assoc. 1994; 1: 8-27
      4. Buchanan BB, Buchanan BG, Buchanan BG, Shortliffe EH, Heuristic S. Rule-based expert systems: the MYCIN experiments of the stanford heuristic programming project. Addison Wesley Publishing Company; 1984.

        • Aikins J.S.
        • Kunz J.C.
        • Shortliffe E.H.
        • Fallat R.J.
        PUFF: an expert system for interpretation of pulmonary function data.
        Comput Biomed Res. 1983; 16: 199-208
        • Miller R.A.
        • Pople Jr, H.E.
        • Myers J.D.
        Internist-1, an experimental computer-based diagnostic consultant for general internal medicine.
        N Engl J Med. 1982; 307: 468-476
      5. Buchanan BG. Can Machine Learning Offer Anything to Expert Systems? In: Marcus S, editor. Knowledge Acquisition: Selected Research and Commentary: A Special Issue of Machine Learning on Knowledge Acquisition, Boston, MA: Springer US; 1990, p. 5–8.

        • Su M.C.
        Use of neural networks as medical diagnosis expert systems.
        Comput Biol Med. 1994; 24: 419-429
        • Aizenberg I.N.
        • Aizenberg N.N.
        • Vandewalle J.
        Multi-valued and universal binary neurons. 2000; https://doi.org/10.1007/978-1-4757-3115-6
        • Liu X.
        • Faes L.
        • Kale A.U.
        • Wagner S.K.
        • Fu D.J.
        • Bruynseels A.
        • et al.
        A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis.
        Lancet Digit Health. 2019; 1: e271-e297
        • Esteva A.
        • Kuprel B.
        • Novoa R.A.
        • Ko J.
        • Swetter S.M.
        • Blau H.M.
        • et al.
        Dermatologist-level classification of skin cancer with deep neural networks.
        Nature. 2017; 542: 115-118
        • Lotter W.
        • Diab A.R.
        • Haslam B.
        • Kim J.G.
        • Grisot G.
        • Wu E.
        • et al.
        Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach.
        Nat Med. 2021; 27: 244-249
      6. Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [csCV] 2018.

        • Shen L.
        • Margolies L.R.
        • Rothstein J.H.
        • Fluder E.
        • McBride R.
        • Sieh W.
        Deep learning to improve breast cancer detection on screening mammography.
        Sci Rep. 2019; 9: 12495
        • Komura D.
        • Ishikawa S.
        Machine learning methods for histopathological image analysis.
        Comput Struct Biotechnol J. 2018; 16: 34-42https://doi.org/10.1016/j.csbj.2018.01.001
        • Yadav S.S.
        • Jadhav S.M.
        Deep convolutional neural network based medical image classification for disease diagnosis.
        J Big Data. 2019; 6: 113
        • Zhang X.
        • Zhao H.
        • Zhang S.
        • Li R.
        A novel deep neural network model for multi-label chronic disease prediction.
        Front Genet. 2019; 10: 351
        • Hesamian M.H.
        • Jia W.
        • He X.
        • Kennedy P.
        Deep learning techniques for medical image segmentation: achievements and challenges.
        J Digit Imaging. 2019; 32: 582-596