Requirements and reliability of AI in the medical context

Published:March 12, 2021DOI:


      • Reviews developments in Machine Learning (ML) and Artificial Intelligence.
      • Focus Translational application of ML Methods in Oncology.
      • Current Impediments for Reliable ad Reproducibility AI Methods.
      • Recommendations for Reliable, Ethical use of AI methods.


      The digital information age has been a catalyst in creating a renewed interest in Artificial Intelligence (AI) approaches, especially the subclass of computer algorithms that are popularly grouped into Machine Learning (ML). These methods have allowed one to go beyond limited human cognitive ability into understanding the complexity in the high dimensional data. Medical sciences have seen a steady use of these methods but have been slow in adoption to improve patient care. There are some significant impediments that have diluted this effort, which include availability of curated diverse data sets for model building, reliable human-level interpretation of these models, and reliable reproducibility of these methods for routine clinical use. Each of these aspects has several limiting conditions that need to be balanced out, considering the data/model building efforts, clinical implementation, integration cost to translational effort with minimal patient level harm, which may directly impact future clinical adoption. In this review paper, we will assess each aspect of the problem in the context of reliable use of the ML methods in oncology, as a representative study case, with the goal to safeguard utility and improve patient care in medicine in general.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Physica Medica: European Journal of Medical Physics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


      1. The world’s most valuable resource is no longer oil, but data. The Economist. London, UK, 2017.

      2. Initiative MG. Big Data: The next frontier for innovation, competition and producitivty. 2011.

        • Hilbert M.
        • López P.
        The world's technological capacity to store, communicate, and compute information.
        Science (New York, NY). 2011; 332: 60-65
        • Bera K.
        • Schalper K.A.
        • Rimm D.L.
        • Velcheti V.
        • Madabhushi A.
        Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology.
        Nat Rev Clin Oncol. 2019; 16: 703-715
      3. Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al. Artificial intelligence in cancer imaging: Clinical challenges and applications. CA Cancer J Clin, 2019;69:127-57.

        • Nensa F.
        • Demircioglu A.
        • Rischpler C.
        Artificial intelligence in nuclear medicine.
        J Nucl Med. 2019; 60: 29s-37s
        • Kulikowski C.A.
        Beginnings of artificial intelligence in medicine (AIM): computational artifice assisting scientific inquiry and clinical art - with reflections on present AIM challenges.
        Yearbook Med Inf. 2019; 28: 249-256
        • El Naqa I.
        • Haider M.A.
        • Giger M.L.
        • Ten Haken R.K.
        Artificial Intelligence: reshaping the practice of radiological sciences in the 21st century.
        Br J Radiol. 2020; 93: 20190855
      4. Munakata T. Thoughts on Deep Blue Vs. Kasarov. Communications of the ACM: Automation of Computer Machinery; 1996.

      5. IBM. IBM computer Watson wins jeopardy clash. The Guardian: Guardian Media Group; 2011.

        • Gibney E.
        What Google’s winning Go algorithm will do next.
        Nature. 2016; 531: 284-285
        • Mulfari D.
        • Palla A.
        • Fanucci L.
        Embedded systems and TensorFlow frameworks as assistive technology solutions.
        Stud Health Technol Inf. 2017; 242: 396-400
        • Sun Y.
        • Zhu S.
        • Ma K.
        • Liu W.
        • Yue Y.
        • Hu G.
        • et al.
        Identification of 12 cancer types through genome deep learning.
        Sci Rep. 2019; 9: 17256
        • Shah H.
        The DeepMind debacle demands dialogue on data.
        Nature. 2017; 547: 259
      6. Alex Krizhevsky; Sutskever IaH, Geoffrey E. ImageNet classification with deep convolutional neural networks: Curran Associates, Inc.; 2012.

        • Dreyfus S.E.
        Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure.
        J Guid Control Dyn. 1990; 13: 926-928
        • LeCun Y.
        • Bengio Y.
        • Hinton G.
        Deep learning.
        Nature. 2015; 521: 436-444
        • Tu J.V.
        Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes.
        J Clin Epidemiol. 1996; 49: 1225-1231
        • Strickland E.
        IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health care.
        IEEE Spectr. 2019; 56: 24-31
        • Sayers E.W.
        • Agarwala R.
        • Bolton E.E.
        • Brister J.R.
        • Canese K.
        • Clark K.
        • et al.
        Database resources of the National Center for Biotechnology Information.
        Nucleic Acids Res. 2019; 47: D23-D28
        • Sayers E.W.
        • Cavanaugh M.
        • Clark K.
        • Ostell J.
        • Pruitt K.D.
        • Karsch-Mizrachi I.
        Nucleic Acids Res. 2019; 47: D94-D99
        • Clark K.
        • Vendt B.
        • Smith K.
        • Freymann J.
        • Kirby J.
        • Koppel P.
        • et al.
        The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.
        J Digit Imaging. 2013; 26: 1045-1057
      7. Prior FW, Clark K, Commean P, Freymann J, Jaffe C, Kirby J, et al. TCIA: An information resource to enable open science. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual International Conference. 2013;2013:1282-5.

      8. Kirby J, Prior F, Petrick N, Hadjiski L, Farahani K, Drukker K, et al. Introduction to special issue on datasets hosted in the cancer imaging archive (TCIA). Med Phys. 2020.

      9. Duda RO, Hart PE, Stork DG. Pattern classification (2nd ed): Wiley-Interscience; 2000.

        • Cherkassky V.
        • Mulier F.
        Learning from data: concepts, theory, and methods.
        John Wiley & Sons Inc., 2006
        • Yankeelov T.E.
        • Mankoff D.A.
        • Schwartz L.H.
        • Lieberman F.S.
        • Buatti J.M.
        • Mountz J.M.
        • et al.
        Quantitative imaging in cancer clinical trials.
        Clin Cancer Res. 2016; 22: 284-290
        • Winters I.P.
        • Murray C.W.
        • Winslow M.M.
        Towards quantitative and multiplexed in vivo functional cancer genomics.
        Nat Rev Genet. 2018; 19: 741-755
        • O'Loughlin T.A.
        • Gilbert L.A.
        Functional genomics for cancer research: applications in vivo and in vitro.
        Ann Rev Cancer Biol. 2019; 3: 345-363
        • Shortliffe E.H.
        • Davis R.
        • Axline S.G.
        • Buchanan B.G.
        • Green C.C.
        • Cohen S.N.
        Computer-based consultations in clinical therapeutics: Explanation and rule acquisition capabilities of the MYCIN system.
        Comput Biomed Res. 1975; 8: 303-320
        • Giger M.L.
        • Chan H.P.
        • Boone J.
        Anniversary paper: History and status of CAD and quantitative image analysis: the role of Medical Physics and AAPM.
        Med Phys. 2008; 35: 5799-5820
        • Schwartz W.B.
        • Patil R.S.
        • Szolovits P.
        Artificial intelligence in medicine.
        N Engl J Med. 1987; 316: 685-688
        • Rajkomar A.
        • Dean J.
        • Kohane I.
        Machine learning in medicine.
        N Engl J Med. 2019; 380: 1347-1358
        • El Naqa I.
        • Ruan D.
        • Valdes G.
        • Dekker A.
        • McNutt T.
        • Ge Y.
        • et al.
        Machine learning and modeling: Data, validation, communication challenges.
        Med Phys. 2018; 45: e834-e840
        • Goldhahn J.
        • Rampton V.
        • Spinas G.A.
        Could artificial intelligence make doctors obsolete?.
        BMJ (Clin Res Ed). 2018; 363k4563
        • Geis J.R.
        • Brady A.P.
        • Wu C.C.
        • Spencer J.
        • Ranschaert E.
        • Jaremko J.L.
        • et al.
        Ethics of artificial intelligence in radiology: summary of the Joint European And North American Multisociety Statement.
        Radiology. 2019; 293: 436-440
        • Gerke S.
        • Minssen T.
        • Cohen G.
        Ethical and legal challenges of artificial intelligence-driven healthcare.
        Artif Intell Healthc. 2020; : 295-336
      10. Luo Y, Tseng H-H, Cui S, Wei L, Haken RKT, Naqa IE. Balancing accuracy and interpretability of machine learning approaches for radiation treatment outcomes modeling. BJR Open, 2019;1:20190021.

        • Naqa I.E.
        • Kosorok M.R.
        • Jin J.
        • Mierzwa M.
        • Haken R.K.T.
        Prospects and challenges for clinical decision support in the era of big data.
        JCO Clin Cancer Inf. 2018; : 1-12
        • Khan U.A.H.
        • Stürenberg C.
        • Gencoglu O.
        • Sandeman K.
        • Heikkinen T.
        • Rannikko A.
        • et al.
        Improving prostate cancer detection with breast histopathology images.
        in: Reyes-Aldasoro C.C. Janowczyk A. Veta M. Bankhead P. Sirinukunwattana K. Digital pathology. Springer International Publishing, Cham2019: 91-99
        • Rai T.
        • Morisi A.
        • Bacci B.
        • Bacon N.
        • Thomas S.
        • La Ragione R.
        • et al.
        Can ImageNet feature maps be applied to small histopathological datasets for the classification of breast cancer metastatic tissue in whole slide images?.
        SPIE. 2019;
        • Esteva A.
        • Kuprel B.
        • Novoa R.A.
        • Ko J.
        • Swetter S.M.
        • Blau H.M.
        • et al.
        Dermatologist-level classification of skin cancer with deep neural networks.
        Nature. 2017; 542: 115-118
      11. Cerwall PR, E. M. 2016.

        • Gulshan V.
        • Peng L.
        • Coram M.
        • Stumpe M.C.
        • Wu D.
        • Narayanaswamy A.
        • et al.
        Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
        JAMA. 2016; 316: 2402-2410
        • Varadarajan A.V.
        • Bavishi P.
        • Ruamviboonsuk P.
        • Chotcomwongse P.
        • Venugopalan S.
        • Narayanaswamy A.
        • et al.
        Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning.
        Nat Commun. 2020; 11: 130
        • Roth H.R.
        • Lu L.
        • Liu J.
        • Yao J.
        • Seff A.
        • Cherry K.
        • et al.
        Improving computer-aided detection using convolutional neural networks and random view aggregation.
        IEEE Trans Med Imaging. 2016; 35: 1170-1181
        • Albayrak A.
        • Bilgin G.
        Mitosis detection using convolutional neural network based features.
        in: 2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI). 2016: 335-340
        • Saltz J.
        • Gupta R.
        • Hou L.
        • Kurc T.
        • Singh P.
        • Nguyen V.
        • et al.
        Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images.
        Cell Rep. 2018; 23 (181–93 e7)
        • Janowczyk A.
        • Zuo R.
        • Gilmore H.
        • Feldman M.
        • Madabhushi A.
        HistoQC: an open-source quality control tool for digital pathology slides.
        JCO Clin Cancer Inform. 2019; 3: 1-7
        • Senaras C.
        • Niazi M.K.K.
        • Lozanski G.
        • Gurcan M.N.
        DeepFocus: Detection of out-of-focus regions in whole slide digital images using deep learning.
        PLoS ONE. 2018; 13e0205387
        • Ueda D.
        • Shimazaki A.
        • Miki Y.
        Technical and clinical overview of deep learning in radiology.
        Jpn J Radiol. 2019; 37: 15-33
        • Shen D.
        • Wu G.
        • Suk H.-I.
        Deep learning in medical image analysis.
        Annu Rev Biomed Eng. 2017; 19: 221-248
        • Ardila D.
        • Kiraly A.P.
        • Bharadwaj S.
        • Choi B.
        • Reicher J.J.
        • Peng L.
        • et al.
        End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.
        Nat Med. 2019; 25: 954-961
        • Heaven W.D.
        Google’s medical AI was super accurate in a lab. Real life was a different story. MIT technology review.
        MIT, USA2020
        • Nagendran M.
        • Chen Y.
        • Lovejoy C.A.
        • Gordon A.C.
        • Komorowski M.
        • Harvey H.
        • et al.
        Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.
        BMJ (Clin Res ed). 2020; 368m689
        • Chen P.-H.-C.
        • Liu Y.
        • Peng L.
        How to develop machine learning models for healthcare.
        Nat Mater. 2019; 18: 410-414
        • Nagy M.
        • Radakovich N.
        • Nazha A.
        Machine learning in oncology: what should clinicians know?.
        JCO Clin Cancer Inf. 2020; : 799-810
        • Collins G.S.
        • Reitsma J.B.
        • Altman D.G.
        • Moons K.G.
        Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.
        BMJ (Clin Res ed). 2015; 350g7594
      12. Mongan J, Moy L, Charles E. Kahn J. Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers. Radiology, Artif Intell, 2020;2:e200029.

        • Norgeot B.
        • Quer G.
        • Beaulieu-Jones B.K.
        • Torkamani A.
        • Dias R.
        • Gianfrancesco M.
        • et al.
        Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist.
        Nat Med. 2020; 26: 1320-1324
        • Huynh E.
        • Hosny A.
        • Guthier C.
        • Bitterman D.S.
        • Petit S.F.
        • Haas-Kogan D.A.
        • et al.
        Artificial intelligence in radiation oncology.
        Nat Rev Clin Oncol. 2020; 17: 771-781
        • Thompson R.F.
        • Valdes G.
        • Fuller C.D.
        • Carpenter C.M.
        • Morin O.
        • Aneja S.
        • et al.
        Artificial intelligence in radiation oncology: A specialty-wide disruptive transformation?.
        Radiother Oncol. 2018; 129: 421-426
      13. Wolff J, Pauling J, Keck A, Baumbach J. The economic impact of artificial intelligence in health care: systematic review. J Med Internet Res. 2020;22:e16866-e.

        • Reddy S.
        • Allan S.
        • Coghlan S.
        • Cooper P.
        A governance model for the application of AI in health care.
        J Am Med Inf Assoc. 2020; 27: 491-497
        • O'Sullivan S.
        • Nevejans N.
        • Allen C.
        • Blyth A.
        • Leonard S.
        • Pagallo U.
        • et al.
        Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery.
        Int J Med Robot Comput Assist Surg. 2019; 15: e1968
        • Benjamens S.
        • Dhunnoo P.
        • Meskó B.
        The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database.
        NPJ Digital Med. 2020; 3: 118
      14. Administration FaD. Artificial intelligence and machine learning in software as a medical device. 2020. p. FDA Regulation on AI.

        • Bychkov D.
        • Linder N.
        • Turkki R.
        • Nordling S.
        • Kovanen P.
        • Verrill C.
        • et al.
        Deep learning based tissue analysis predicts outcome in colorectal cancer.
        Sci Rep. 2018; 8
        • Eddy D.M.
        • Hollingworth W.
        • Caro J.J.
        • Tsevat J.
        • McDonald K.M.
        • Wong J.B.
        Model transparency and validation: A report of the ISPOR-SMDM modeling good research practices task force-7.
        Value Health. 2012; 15: 843-850
      15. Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery; 2016. p. 1135–44.

        • Barredo Arrieta A.
        • Díaz-Rodríguez N.
        • Del Ser J.
        • Bennetot A.
        • Tabik S.
        • Barbado A.
        • et al.
        Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.
        Inf Fusion. 2020; 58: 82-115
        • Lundberg S.M.
        • Lee S.-I.
        A unified approach to interpreting model predictions.
        in: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Long Beach, California, USA2017: 4768-4777
        • Boulbes D.R.
        • Costello T.
        • Baggerly K.
        • Fan F.
        • Wang R.
        • Bhattacharya R.
        • et al.
        A survey on data reproducibility and the effect of publication process on the ethical reporting of laboratory research.
        Clin Cancer Res. 2018; 24: 3447-3455
        • Zhao B.
        • James L.P.
        • Moskowitz C.S.
        • Guo P.
        • Ginsberg M.S.
        • Lefkowitz R.A.
        • et al.
        Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non-small cell lung cancer.
        Radiology. 2009; 252: 263-272
        • Balagurunathan Y.
        • Kumar V.
        • Gu Y.
        • Kim J.
        • Wang H.
        • Liu Y.
        • et al.
        Test-retest reproducibility analysis of lung CT image features.
        J Digit Imaging. 2014; 27: 805-823
        • Zhao B.
        • Tan Y.
        • Tsai W.Y.
        • Qi J.
        • Xie C.
        • Lu L.
        • et al.
        Reproducibility of radiomics for deciphering tumor phenotype with imaging.
        Sci Rep. 2016; 6: 23428
        • Marshall C.R.
        • Chowdhury S.
        • Taft R.J.
        • Lebo M.S.
        • Buchan J.G.
        • Harrison S.M.
        • et al.
        Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. npj.
        Genom Med. 2020; 5: 47
        • Marioni J.C.
        • Mason C.E.
        • Mane S.M.
        • Stephens M.
        • Gilad Y.
        RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
        Genome Res. 2008; 18: 1509-1517
        • Kanwal S.
        • Khan F.Z.
        • Lonie A.
        • Sinnott R.O.
        Investigating reproducibility and tracking provenance - A genomic workflow case study.
        BMC Bioinf. 2017; 18: 337
        • Rieke N.
        • Hancox J.
        • Li W.
        • Milletarì F.
        • Roth H.R.
        • Albarqouni S.
        • et al.
        The future of digital health with federated learning.
        NPJ Digital Med. 2020; 3: 119
        • Jochems A.
        • Deist T.M.
        • van Soest J.
        • Eble M.
        • Bulens P.
        • Coucke P.
        • et al.
        Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept.
        Radiother Oncol. 2016; 121: 459-467
        • Jochems A.
        • Deist T.M.
        • El Naqa I.
        • Kessler M.
        • Mayo C.
        • Reeves J.
        • et al.
        Developing and validating a survival prediction model for NSCLC patients through distributed learning across 3 countries.
        Int J Radiat Oncol Biol Phys. 2017; 99: 344-352
        • Ryan M.
        • In A.I.
        We trust: ethics, artificial intelligence, and reliability.
        Sci Eng Ethics. 2020; 26: 2749-2767
        • Asaro P.M.
        AI ethics in predictive policing: from models of threat to an ethics of care.
        IEEE Technol Soc Mag. 2019; 38: 40-53
        • Leslie D.
        Understanding artificial intelligence ethics and safety.
        The Alan Turing Institute, London2019
        • Kelly C.J.
        • Karthikesalingam A.
        • Suleyman M.
        • Corrado G.
        • King D.
        Key challenges for delivering clinical impact with artificial intelligence.
        BMC Med. 2019; 17: 195
        • Van Den Bos J.
        • Rustagi K.
        • Gray T.
        • Halford M.
        • Ziemkiewicz E.
        • Shreve J.
        The $17.1 billion problem: the annual cost of measurable medical errors.
        Health Affairs (Project Hope). 2011; 30: 596-603
        • Obermeyer Z.
        • Lee T.H.
        Lost in thought - The limits of the human mind and the future of medicine.
        N Engl J Med. 2017; 377: 1209-1211
        • Dong J.
        • Geng Y.
        • Lu D.
        • Li B.
        • Tian L.
        • Lin D.
        • et al.
        Clinical trials for artificial intelligence in cancer diagnosis: A cross-sectional study of registered trials in
        Front Oncol. 2020; 10: 1629
        • COMEST
        Preliminary study on the ethics of Artificial Intelligence.
        UNESCO, USA2019
      16. Light G. Race, policing, and Detroit's project green light. 2019.

        • Harmon A.
        As cameras track Detroit's residents, a debate ensues over racial bias.
        NY Times. 2019;
        • Rappoport N.
        • Shamir R.
        Multi-omic and multi-view clustering algorithms: review and cancer benchmark.
        Nucleic Acids Res. 2018; 46: 10546-10562
        • Uddin S.
        • Khan A.
        • Hossain M.E.
        • Moni M.A.
        Comparing different supervised machine learning algorithms for disease prediction.
        BMC Med Inf Decis Making. 2019; 19: 281
        • Levine A.B.
        • Schlosser C.
        • Grewal J.
        • Coope R.
        • Jones S.J.M.
        • Yip S.
        Rise of the machines: advances in deep learning for cancer diagnosis.
        Trends Cancer. 2019; 5: 157-169
        • Bernard O.
        • Lalande A.
        • Zotti C.
        • Cervenansky F.
        • Yang X.
        • Heng P.
        • et al.
        Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?.
        IEEE Trans Med Imaging. 2018; 37: 2514-2525
      17. Hosny A, Parmar C, Coroller TP, Grossmann P, Zeleznik R, Kumar A, et al. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Med. 2018;15:e1002711-e.

        • Zou J.
        • Huss M.
        • Abid A.
        • Mohammadi P.
        • Torkamani A.
        • Telenti A.
        A primer on deep learning in genomics.
        Nat Genet. 2019; 51: 12-18
        • Li Y.
        • Shi W.
        • Wasserman W.W.
        Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.
        BMC Bioinf. 2018; 19: 202
        • Lu J.
        • Jin S.
        • Liang J.
        • Zhang C.
        Robust few-shot learning for user-provided data.
        IEEE transactions on neural networks and learning systems. 2020
        • Lentz S.
        How to meet the challenges of auto recalls.
        SME Society of Mechanical Engineers, 2020
        • Haibe-Kains B.
        • Adam G.A.
        • Hosny A.
        • Khodakarami F.
        • Shraddha T.
        • Kusko R.
        • et al.
        Transparency and reproducibility in artificial intelligence.
        Nature. 2020; 586: E14-E16
        • Miller D.D.
        • Brown E.W.
        Artificial intelligence in medical practice: The question to the answer?.
        Am J Med. 2018; 131: 129-133