Overlooked pitfalls in multi-class machine learning classification in radiation oncology and how to avoid them

Published:January 25, 2020DOI:


      • MC study showed limitation of correlation coefficients in multi-class classification.
      • Explanation of why categorical outcome prediction requires special consideration.
      • MC simulation designed to show shortcoming of surrogate biomarkers in clinical trails.


      In radiation oncology, Machine Learning classification publications are typically related to two outcome classes, e.g. the presence or absence of distant metastasis. However, multi-class classification problems also have great clinical relevance, e.g., predicting the grade of a treatment complication following lung irradiation. This work comprised two studies aimed at making work in this domain less prone to statistical blindsides.
      In multi-class classification, AUC is not defined, whereas correlation coefficients are. It may seem like solely quoting the correlation coefficient value (in lieu of the AUC value) is a suitable choice. In the first study, we illustrated using Monte Carlo (MC) models why this choice is misleading. We also considered the special case where the multiple classes are not ordinal, but nominal, and explained why Pearson or Spearman correlation coefficients are not only providing incomplete information but are actually meaningless.
      The second study concerned surrogate biomarkers for a clinical endpoint, which have purported benefits including potential for early assessment, being inexpensive, and being non-invasive. Using a MC experiment, we showed how conclusions derived from surrogate markers can be misleading. The simulated endpoint was radiation toxicity (scale of 0–5). The surrogate marker was the true toxicity grade plus a noise term. Five patient cohorts were simulated, including one control. Two of the cohorts were designed to have a statistically significant difference in toxicity. Under 1000 repeated experiments using the biomarker, these two cohorts were often found to be statistically indistinguishable, with the fraction of such occurrences rising with the level of noise.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Physica Medica: European Journal of Medical Physics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Gillies R.
        • Kinahan P.
        • Hricak H.
        Radiomics: images are more than pictures, they are data.
        Radiology. 2016; 278: 563-577
        • Larue R.
        • Defraene R.
        • De Ruysscher D.
        • Lambin P.
        • Van Elmpt W.
        Quantitative radiomics studies for tissue characterization: a review of technology and methodological procedures.
        Br J Radiol. 2017; 90: 20160665
        • Avanzo M.
        • Stancanello J.
        • El Naqa I.
        Beyond imaging: the promise of radiomics.
        Phys Med. 2017; 38: 122-139
        • Peeken J.C.
        • Bernhofer M.
        • Wiestler B.
        • Goldberg T.
        • Cremers D.
        • Rost B.
        • et al.
        Radiomics in radiooncology–challenging the medical physicist.
        Phys Med. 2018; 48: 27-36
        • Vallières M.
        • Freeman C.R.
        • Skamene S.R.
        • El Naqa I.
        A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities.
        Phys Med Biol. 2015; 60: 5471
        • Parmar C.
        • Grossmann P.
        • Bussink J.
        • Lambin P.
        • Aerts H.J.
        Machine learning methods for quantitative radiomic biomarkers.
        Sci Rep. 2015; 5: 13087
        • Vittinghoff E.
        • McCulloch C.E.
        Relaxing the rule of ten events per variable in logistic and Cox regression.
        Am J Epidemiol. 2007; 165: 710-718
        • Chatterjee A.
        • Vallières M.
        • Dohan A.
        • Levesque I.R.
        • Ueno Y.
        • Bist V.
        • et al.
        An empirical approach for avoiding false discoveries when applying high-dimensional radiomics to small datasets.
        IEEE TRPMS. 2018; 3: 201-209
        • Hand D.J.
        • Till R.J.
        A simple generalisation of the area under the ROC curve for multiple class classification problems.
        Mach Learn. 2001; 45: 171-186
        • Kniep H.C.
        • Madesta F.
        • Schneider T.
        • Hanning U.
        • Schönfeld M.H.
        • Schön G.
        • et al.
        Radiomics of brain MRI: utility in prediction of metastatic tumor type.
        Radiology. 2018; 290: 479-487
        • Ortiz-Ramón R.
        • Larroza A.
        • Ruiz-España S.
        • Arana E.
        • Moratal D.
        Classifying brain metastases by their primary site of origin using a radiomics approach based on texture analysis: a feasibility study.
        Eur Radiol. 2018; 28: 4514-4523
        • Saad M.
        • Choi T.S.
        Deciphering unclassified tumors of non-small-cell lung cancer through radiomics.
        Comput Biol Med. 2017; 91: 222-230
        • Liao X.
        • Cai B.
        • Tian B.
        • Luo Y.
        • Song W.
        • Li Y.
        Machine-learning based radiogenomics analysis of MRI features and metagenes in glioblastoma multiforme patients with different survival time.
        J Cell Mol Med. 2019; 23: 4375-4385
        • Chen C.
        • Guo X.
        • Wang J.
        • Guo W.
        • Ma X.
        • Xu J.
        The diagnostic value of radiomics-based machine learning in predicting the grade of meningiomas using conventional magnetic resonance imaging: a preliminary study.
        Front Oncol. 2019; 9: 1338
        • Ferri C.
        • Hernández-Orallo J.
        • Salido M.A.
        Volume under the ROC surface for multi-class problems.
        in: ECML. Springer, Berlin, Heidelberg2003: 108-120
      1. Glass GV, Hopkins KD. Statistical methods in education and psychology (3rd ed.). Allyn & Bacon; 1995. ISBN 0-205-14212-5.

        • Khamis H.
        Measures of association: how to choose?.
        J Diagn Med Sonogr. 2008; 24: 155-162
        • Sánchez-Maroño N.
        • Alonso-Betanzos A.
        • Tombilla-Sanromán M.
        Filter methods for feature selection–a comparative study.
        in: IDEAL. Springer, Berlin, Heidelberg2007: 178-187
        • Mori M.
        • Benedetti G.
        • Partelli S.
        • Sini C.
        • Andreasi V.
        • Broggi S.
        • et al.
        Ct radiomic features of pancreatic neuroendocrine neoplasms (panNEN) are robust against delineation uncertainty.
        Phys Med. 2019; 57: 41-46