Advertisement

Performance of an artificial intelligence tool with real-time clinical workflow integration – Detection of intracranial hemorrhage and pulmonary embolism

  • Author Footnotes
    1 Nico Buls and Nina Watté have contributed equally to first authorship.
    Nico Buls
    Correspondence
    Corresponding author.
    Footnotes
    1 Nico Buls and Nina Watté have contributed equally to first authorship.
    Affiliations
    Department of Radiology, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Laarbeeklaan 101, 1090 Brussels, Belgium
    Search for articles by this author
  • Author Footnotes
    1 Nico Buls and Nina Watté have contributed equally to first authorship.
    Nina Watté
    Footnotes
    1 Nico Buls and Nina Watté have contributed equally to first authorship.
    Affiliations
    Department of Radiology, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Laarbeeklaan 101, 1090 Brussels, Belgium
    Search for articles by this author
  • Koenraad Nieboer
    Affiliations
    Department of Radiology, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Laarbeeklaan 101, 1090 Brussels, Belgium
    Search for articles by this author
  • Bart Ilsen
    Affiliations
    Department of Radiology, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Laarbeeklaan 101, 1090 Brussels, Belgium
    Search for articles by this author
  • Johan de Mey
    Affiliations
    Department of Radiology, Universitair Ziekenhuis Brussel (UZ Brussel), Vrije Universiteit Brussel (VUB), Laarbeeklaan 101, 1090 Brussels, Belgium
    Search for articles by this author
  • Author Footnotes
    1 Nico Buls and Nina Watté have contributed equally to first authorship.
Open AccessPublished:March 25, 2021DOI:https://doi.org/10.1016/j.ejmp.2021.03.015

      Highlights

      • Intra-cranial hemorrhage and pulmonary embolism are life-threatening pathologies.
      • CT imaging is essential to confirm diagnosis.
      • In a real-time clinical setting AI shows the potential to rule out ICH and PE.
      • The positive predictive value of AI remains moderate.
      • AI has the potential to assist radiologists and serve as real-time clinical adjunct.

      Introduction

      Acute pathologies require early detection with prompt communication of critical findings to ensure adequate clinical management. Intra-cranial hemorrhage (ICH) and pulmonary embolism (PE) are two of such frequent life-threatening pathologies, with significant morbidity and mortality, where misdiagnosis can lead to adverse outcome [
      • van Asch C.J.
      • Luitse M.J.
      • Rinkel G.J.
      • van der Tweel I.
      • Algra A.
      • Klijn C.J.
      Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis.
      ,
      • Heit J.J.
      • Iv M.
      • Wintermark M.
      Imaging of intracranial hemorrhage.
      ,
      • Morales H.
      Pitfalls in the imaging interpretation of intracranial hemorrhage.
      ]. A non-contrast head CT scan is essential to confirm diagnosis and risk stratification of ICH, while contrast enhanced Computed Tomography Pulmonary Angiography (CTPA) is a standard scan for detecting and locating PE [
      • Morales H.
      Pitfalls in the imaging interpretation of intracranial hemorrhage.
      ,
      • Estrada-Y-Martin R.M.
      • Oldham S.A.
      CTPA as the gold standard for the diagnosis of pulmonary embolism.
      ].
      Advances in CT technology have led to the improvement of image quality and reduction of radiation dose, which allows the diagnosis of more subtle lesions. However, the increasing volume in number of examinations and images per examination, can have a disproportionate effect on the radiologist’ work stream. McDonald et al., calculated in their study on the influence of technological advancements of cross-sectional imaging on the radiology workflow, that a radiologist analyses an average of one image every three seconds [
      • McDonald R.J.
      • Schwartz K.M.
      • Eckel L.J.
      • et al.
      The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload.
      ]. This time-intensive encumbrance on the practicing radiologist, can accrue an increase in false negative results and misdiagnosis [
      • Grob D.
      • Smit E.
      • Oostveen L.J.
      • et al.
      Image quality of iodine maps for pulmonary embolism: A comparison of subtraction CT and dual-energy CT [published online ahead of print, 2019 Mar 12].
      ,
      • Brady A.P.
      Error and discrepancy in radiology: inevitable or avoidable?.
      ,
      • Sokolovskaya E.
      • Shinde T.
      • Ruchman R.B.
      • et al.
      The effect of faster reporting speed for imaging studies on the number of misses and interpretation errors: A pilot study.
      ]. Real-time double reading by a peer is often done, which has been proved to aid in lowering the prevalence of misdiagnosis, however it is very labor-intensive. In addition, retrospective peer reviewing of cases does not immediate improve the patient’s clinical outcome, especially not in an acute setting [
      • Geijer H.
      • Geijer M.
      Added value of double reading in diagnostic radiology, a systematic review.
      ,
      • Muroff L.R.
      • Berlin L.
      Speed versus interpretation accuracy: Current thoughts and literature review.
      ,
      • Babiarz L.S.
      • Yousem D.M.
      Quality control in neuroradiology: discrepancies in image interpretation among academic neuroradiologists.
      ].
      Given the potential adverse outcome in case of misdiagnosis of ICH or PE, the increasing radiology workload, the constant development of new advanced computed tomography techniques and nowadays pandemics that effect our health care system, artificial intelligence (AI) technologies can assist radiologists and serve as a real-time clinical adjunct to diagnose ICH and PE. Using convolutional neural networks (CNN) based on deep learning, AI algorithms are becoming accessible, which can detect those life-threatening lesions [
      • Arbabshirani M.R.
      • Fornwalt B.K.
      • Mongelluzzo G.J.
      • et al.
      Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration.
      ,
      • Prevedello L.M.
      • Erdal B.S.
      • Ryu J.L.
      • et al.
      Automated critical test findings identification and online notification system using artificial intelligence in imaging.
      ,
      • Chang P.D.
      • Kuoy E.
      • Grinband J.
      • et al.
      Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT.
      ]. AI technologies have multiple potential roles such as quality assurance and productivity enhancement. However, certain roles within specific pathologies have not yet been fully investigated. Implementing an AI tool during a real-time radiology work stream, has the potential to react earlier and/or even notice lesions that can be easily overlooked by a radiologist [
      • Chilamkurthy S.
      • Ghosh R.
      • Tanamala S.
      • et al.
      Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.
      ,
      • Prevedello L.M.
      • Erdal B.S.
      • Ryu J.L.
      • et al.
      Automated critical test findings identification and online notification system using artificial intelligence in imaging.
      ,
      • Paiva O.A.
      • Prevedello L.M.
      The potential impact of artificial intelligence in radiology.
      ,

      Ojeda P, Zawaideh M, Mossa-Basha M, et al. The utility of deep learning: evaluation of a convolutional neural network for detection of intracranial bleeds on non-contrast head computed tomography studies. SPIE Medical Imaging, 2019, Proceedings Volume 10949, Medical Imaging 2019: Image Processing; 109493J.

      ,
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ].
      Much research in recent years has focused on such AI solutions, which has indicated a sensitivity of 0.95, specificity of 0.99, negative predictive value (NPV) of 0.98 and positive predictive value (PPV) of 0.98 with an overall accuracy of 0.98 for ICH detection. Rao et al. applied a AI solution to negative-by-report ICH cases. They found a false-negative rate of radiologists for ICH detection at 1.6%, and thus the technology could serve by minimizing false negatives [
      • Rao B.
      • Zohrabian V.
      • Cedeno P.
      • Saha A.
      • Pahade J.
      • Davis M.A.
      Utility of artificial intelligence tool as a prospective radiology peer reviewer - detection of unreported intracranial hemorrhage [published online ahead of print, 2020 Feb 24].
      ]. Weikert et al. found a high degree of diagnostic accuracy for PE detection on CTPAs, and a balanced sensitivity and specificity of 0.93 and 0.96 [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ]. Also with PE, sensitivity, specificity, positive and negative predictive values and accuracy compared with gold standard senior radiologists were reported, 0.85, 0.97, 0.85, 0.97 and 0.95 respectively [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ].
      The purpose of this study was to assess the performance of a commercially available AI tool as a second reader in detecting ICH and PE in a diverse clinical setting (e.g., emergency, routine, inpatient, and outpatient) in real-time assessment, by evaluation of the number processed studies by the AI tool and calculation of the diagnostic performance respectively.

      Materials and methods

      This retrospective study was conducted with approval by our local institutional medical ethics committee with a waiver of informed consent. Our study was bipartite and focused on two acute pathologies: Intra-cranial hemorrhage (ICH) and pulmonary embolism (PE). Each subdivision consisted of 4 stages: (1) Dataset collection; (2) Image data processing by an automated AI tool; (3) Quality control and discrepancy revision by registered radiologists with certificate of added qualification in neuroradiology or thorax radiology; and (4) diagnostic performance analysis.

      Dataset collection

      Random case collection was performed from a consecutive database of patients referred to our radiology department for an non-contrast head CT or CTPA, blinded to clinical data regarding antecedents, diagnose, therapy or outcome. Both for the brain and lung study, patients under the age of 18 were excluded. CT exams were pseudo-anonymized, retaining solely an identification code to link each report to its respective study. Control CTs were excluded, resulting in eliminating duplicate exams and a final cohort of unique patients and scans.
      A total of 500 consecutive non-contrast CT exams of the head performed over 31 days from September 1, 2019 until October 1, 2019, were included. This consecutive case collection varies considerably in terms of neurologic pathology signs such as hemorrhage, mass effect, hydrocephalus, suspected acute infarct, encephalomalacia or no evidence of intracranial disease. Cases also differ markedly in hemorrhage age and attenuation on CT (respectively hypo-, iso- and hyperattenuating), hemorrhage size and location (epidural, subdural, subarachnoid and intraparenchymal). Scans with movement artifacts, sloped and postoperative studies remained included to represent standard practice.
      Secondly, we considered 500 consecutive CTPA scans performed between July 1, 2019 and February 1,2020. This data set consists of pulmonary emboli (central, segmental and subsegmental), common diseases such as interstitial pneumonias, acute respiratory distress syndrome, sarcoidosis, lymphangitic carcinomatosis, cardiogenic pulmonary edema and normal findings. Scans containing moderate breathing or beam hardening artifacts, were included as well to represent routine radiology practice.
      CT examinations were performed on one of our four different scanners, of which one used dual-energy technology (DECT). Table 1 specifies information with reference to the utilized scanner vendor, model, tube voltage, single collimation width, reconstruction slice thickness, reconstruction kernel and radiation doses. Data from all axial, coronal and sagittal planes, available from the picture archiving and communication system (PACS), were utilized. In addition, for the DECT scanner, material specific pulmonary iodine maps, an alternative for depicting perfusion defects in PE, were also available for the expert reviewers.
      Table 1Scanner models, scan- and reconstruction parameters and radiation doses used for the two indications. Doses represent median values with 95% confidence intervals between brackets.
      Scanner modelNumber of casesTube voltage (kV)Single collimation width (mm)Reconstructed slice thickness (mm)Reconstruction methodReconstruction kernelCTDIvol (mGy)DLP (mGy.cm)
      Intracranial hemorrhage (ICH)500
      GE Revolution212 (42%)DECT0.6250.625DLIR-MStandard35.8 (35.5–37.2)724 (698–751)
      GE Discovery 750HD173 (35%)1200.6250.625ASiR 30%Soft38.4 (37.5–39.4)756 (725–787)
      Philips iCT109 (22%)1000.6250.8iDose3UB (standard)29.9 (29.3–30.6)598 (581–614)
      Siemens Somatom AS406 (1%)1200.60.6FBPH31s60.81006 (990–1023)
      Pulmonary embolism (PE)500
      GE Revolution282 (56%)DECT0.6250.625ASiR-V 70%Standard6.4 (5.9–6.8)229 (211–248)
      GE Discovery 750HD203 (41%)100–1200.6250.625ASiR 30%Detail11.8 (10.8–12.8)441 (405–478)
      Philips iCT15 (3%)100–1200.6250.9iDose3B (standard)4.1 (3.8–4.4)174 (163–183)

      Image data processing by AI tool

      A commercially available, FDA-and CE-cleared (European Medical Devices Directive 93/42/EEC M5) AI tool, based on convolutional neural networks (Aidoc version 1.3, Tel Aviv, Israel) was implemented in our radiological workflow. The algorithm was trained and tested by a dataset that included approximately 50,000 non-contrast head CT studies for the detection of ICH [

      Ojeda P, Zawaideh M, Mossa-Basha M, et al. The utility of deep learning: evaluation of a convolutional neural network for detection of intracranial bleeds on non-contrast head computed tomography studies. SPIE Medical Imaging, 2019, Proceedings Volume 10949, Medical Imaging 2019: Image Processing; 109493J.

      ] and 28,000 CTPA studies for the detection of PE [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ], collected from 9 different sites and 17 different scanner models. According to the manufacturer’s specifications, CT acquisition should be performed with a 64-slice scanner or higher and with a reconstructed slice thickness between 0.625 and 5.1 mm for ICH and 0.5–3.0 mm for PE. In addition, all technically inadequate scans should be excluded, such as scans with motion artifacts, severe metal artifacts, inadequate field of view and sub-optimal contrast bolus (PE).
      As soon as they are available in our PACS, CT images are automatically pseudo-anonymized and subsequently send for AI processing to a cloud server. The AI technology processed the non-enhanced head CT dataset to rule out ICH and CTPA data to diagnose PE. Afterwards, quantitative results and location specific annotated images are sent back into the PACS as additional dicom series. In case of intracranial bleeding or pulmonary embolism, these additional AI marked series contain images with arrows pointed where the pathology is situated. The AI report is seamlessly integrated into the clinical workflow, with the results being automatically added to the CT study. The typical time between the CT acquisition to the notification (AI results available in PACS) varies between 3 and 7 min for ICH studies and 5–9 min for PE studies.

      Diagnostic performance and discrepancy review

      From the 500 consecutive head CT exams and CTPA exams that were presented to the AI tool, we registered the number of studies that were sent back with an AI report. Secondly, we evaluated its diagnostic performance by comparison to expert reviewing. The original clinical radiology report after consensus review by 3 neuro-radiologists for ICH and 3 thorax radiologists PE, was considered as gold standard. Six board-certified radiologists participated in the consensus review with each 5 up to 15 year of experience in reading unenhanced head CT and CTPA studies. The reviewers had access to prior and future studies, and were able to see clinical history and reports to diagnose. AI results were classified into true positive, false positive, true negative and false negative cases. True-positive (TP) cases contained hemorrhage or embolism detected by the AI tool, and subsequently confirmed by the consensus reviewers. True-negatives (TN) consisted of exams without ICH nor PE according to the AI tool and reviewers. False-positives (FP) were defined as cases that were flagged positive by the AI tool but found out to be negative. False-negatives (FN) were defined as cases that were classified by the AI tool as negative but decided to be positive for ICH/PE by consensus review. We quantified the diagnostic performance of the AI algorithm in ICH and PE detection by calculating the sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV) and accuracy. The concordance between the AI tool and consensus review on each pathology, was calculated using percentage of agreement and Cohen’s k statistic.
      In addition, the expert reviewers performed a detailed discrepancy analysis of the false-positive and false-negative cases in order to identify the reason for miss classification by AI.

      Results

      Table 2 summarizes the AI performance results for detecting ICH and PE. From the 500 presented cases, the AI algorithm could process 77.6% (388 of 500) for ICH evaluation during real-time radiology work stream. No ICH evaluation was performed for 112 studies. There was a difference in process rate between scanners models ranging from 84.9% (GE Discovery HD) to 53.8% (Philips iCT). All 6 Siemens cases were rejected for processing. From all 388 processed studies, the AI tool flagged 31 (7.9%) exams as having ICH. Expert review flagged 37 (9.5%) hemorrhages. Substantial agreement (kappa-value of 0.65) between AI and expert reading was observed. The performance for ICH showed 0.84 sensitivity and 0.94 specificity. The negative and positive predictive values were 0.98 and 0.61 respectively. The AI tool failed to label 1.7% (6 of 337) cases, that were agreed to have hemorrhage after review by three subspecialists (false negative by the AI tool). Those six cases summed in Table 3, conclude two discrete subarachnoid hemorrhages, two subdural hemorrhages and two parenchymal hemorrhages. Twenty positive results out of 37 flagged exams by the AI tool, were labelled as false-positive findings after consensus review, which the reviewers assigned to falcine or basal ganglia calcifications (9/20 cases), beam hardening artifacts (8/20 cases) and hyperdense dural sinuses (3/20), shown in Table 3.
      Table 2Number of processed studied and diagnostic performance of AI tool in detecting ICH and PE. Diagnostic accuracy values between brackets represent 95% confidence intervals.
      ICHPE
      Number of studies presented500500
      Studies with AI result
      All scanners77.6% (388/500)89.6% (448/500)
      GE Revolution84.4% (179/212)92.5% (261/282)
      GE Discovery 750HD84.9% (147/173)90.6% (184/203)
      Philips iCT56.8% (62/109)
      Significantly lower than both GE scanners (p < 0.05, Fischer Exact Probability test).
      13.3% (2/15)
      Significantly lower than both GE scanners (p < 0.05, Fischer Exact Probability test).
      Siemens Somatom AS400% (0/6)
      Significantly lower than both GE scanners (p < 0.05, Fischer Exact Probability test).
      N.a.
      Diagnostic Performance
      Sensitivity0.84 (0.68–0.94)0.73 (0.62–0.82)
      Specificity0.94 (0.91–0.96)0.95 (0.93–0.97)
      NPV0.98 (0.96–0.99)0.94 (0.91–0.96)
      PPV0.61 (0.46–0.74)0.73 (0.62–0.82)
      Accuracy0.93 (0.90–0.96)0.98 (0.96–0.99)
      N.a. Not available.
      a Significantly lower than both GE scanners (p < 0.05, Fischer Exact Probability test).
      Table 3Detailed analysis of false negative and false positive ICH cases by AI.
      Detailed analysis of False Negative ICH cases by AI
      Subarachnoid hemorrhages33% (2/6)
      Subdural hemorrhages33% (2/6)
      Parenchymal hemorrhages33% (2/6)
      Detailed analysis of False Positive ICH cases by AI
      Falcine and Basal ganglia calcifications45% (9/20)
      Beam hardening artefacts40% (8/20)
      Hyperdense dural sinuses15% (3/20)
      The AI technology created a report for 448 (89.6%) consecutive CTPA’s. Similar to ICH, the process rate for the Philips iCT was lower (13.3%), compared to the GE scanners (90.6% and 92.5%). The sensitivity and specificity and accuracy were 0.73, 0.95 and 0.90 respectively. The kappa value for agreement between AI and expert reading was 0.78, indicating a substantial concordance. The expert readers detected 82 cases positive for PE. The AI system did not identify 19 of these 82 PE cases (false negative by AI). Nine of these patients, had chronic pulmonary embolisms. Six cases had masquerading artifacts. In three cases, an underlying pathology had concealed the present emboli. In 1 patient, a superimposing vein was the cause of the missed pulmonary embolism. 17 out of 19 misdiagnosed patients had subsegmental and segmental PE. Chronic known central emboli were missed in 1 patient. Lobar emboli were misdiagnosed by AI in 1 patient during delayed scan phase.
      24.4% of studies were found to be false-positive findings by the AI solution. According the consensus reviewers, six FP cases were due to contrast agent-related flow artifacts, beam hardening artifacts and breathing artifacts. Another six false positive cases were due to the masking effect of associated pathologies (such as infiltrate, metastasis, pleural effusion, atelectasis and fibrosis) or superposition anatomy (e.g. pulmonary vein, lymph node, hilar soft tissue, bronchus, azygos vein or pulmonary artery bifurcation). Withal, 7 out of 18 patients had a false positive diagnosis caused by a combination by the aforementioned factors.
      Examples of intracranial hemorrhages and pulmonary embolism with AI detection are shown in Fig. 1, Fig. 2, Fig. 3, Fig. 4.
      Figure thumbnail gr1
      Fig. 1Left: true positive ICH case by AI, indicated by the yellow arrow at a cortical hemorrhage in the right frontal lobe, was contemplated as a difficult subtle lesion by the reviewers. Right: true positive PE case by AI, indicated by the yellow arrow towards a segmental embolism in the anterior segment of left upper lobe. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
      Figure thumbnail gr2
      Fig. 2False negative ICH case by AI: subtle stroke hemorrhagic transformation not identified by the AI tool (white arrow).
      Figure thumbnail gr3
      Fig. 3Left: false Positive ICH case by AI (yellow arrow) probably due to the presence of surrounded periventricular white matter hypoattenuation. Right: same image with different window/level settings. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
      Figure thumbnail gr4
      Fig. 4Top (A and B): False negative PE case by AI, lesion (white arrow) probably missed due to masquerading anatomy by hilar soft tissue. Bottom (A and B): False positive PE case by AI, a pulmonary vein (yellow arrow) is the cause of a false positive marked case by AI. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

      Discussion

      This two-folded study assessed the diagnostic performance of an AI algorithm to automatically rule out ICH on non-contrast head CTs on one hand, and PE diagnose via CTPAs on the other hand, with real-time clinical work flow integration. In a diverse real-time clinical setting, 77.6% (388/500) of consecutive head CT exams and 89.6% (448/500) CTPA exams could be automatically evaluated by AI. However, AI process rates increase to 84.6% (326/385) for head CT exams and 91.7% (445/485) for CTPA if we only consider the two GE scanners who represent the bulk of the studies. We did not asses the cause of failure to process nor the follow up of these patients, as this was not part of the study protocol. Possible causes can be hospital-network related or can be attributed to inadequate radiological quality due to, for example, increased noise, the presence of motion artifacts or metal artifacts. The few Siemens cases were rejected for AI processing (0/6) because they were not compliant with the AI-tool requirements (<64 slice CT).
      With a specificity of 0.94 and a 0.98 negative predictive value, our ICH study is in line with prior work. Previous research with the same AI tool reported a specificity of 0.99 and a NPV of 0.98 for ICH detection. However, the sensitivity and PPV were 0.84 and 0.61 in our study, and remain moderate in comparison with previous studies, in which a sensitivity of 0.95 and 0.98 PPV was obtained [
      • Rao B.
      • Zohrabian V.
      • Cedeno P.
      • Saha A.
      • Pahade J.
      • Davis M.A.
      Utility of artificial intelligence tool as a prospective radiology peer reviewer - detection of unreported intracranial hemorrhage [published online ahead of print, 2020 Feb 24].
      ]. Weikert et al., stated that prior research with a sensitivity above 0.85 accepted more false positive findings, which increase the amount of false positive cases, thereby increasing the radiologist’s workload and time to therapy initiation [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ].
      False positive cases were more frequent (54% or 20/37) than false negatives, and mostly due to falcine or basal ganglia calcifications, hyperdense dural sinuses and streak artifacts, these results are in line with prior work assessed by Roa et al., effortlessly recognized without any difficulties by the original reporting radiologist [
      • Rao B.
      • Zohrabian V.
      • Cedeno P.
      • Saha A.
      • Pahade J.
      • Davis M.A.
      Utility of artificial intelligence tool as a prospective radiology peer reviewer - detection of unreported intracranial hemorrhage [published online ahead of print, 2020 Feb 24].
      ]. An example of a false positive ICH case by AI (yellow arrow) is shown in Fig. 3, probably due to the presence of surrounded periventricular white matter hypo-attenuation.
      False negatives ICH cases occurred in a limited number of cases (6/337) and were mainly seen in very small hemorrhages or follow-up exams within deteriorating patients.
      The most commonly false negative cases were sulcal subdural and subarachnoid hemorrhage, predominant in convex brain regions. However, we had one less subtle stroke hemorrhagic transformation (Fig. 2). Although, this missed finding could still be explained by very low HU densities within the lesion itself, luckily it was fast diagnosed by the original reporting radiologist. On one hand, the interpreting radiologist should scrutinize those critic brain regions very carefully. On the other hand, a missed subtle hemorrhage may be inconsequential, it needs to get your attention as well and may indicate further examination.
      With high specificity and negative predictive value, the AI tool shows the potential to rule out ICH.
      For our PE study we achieved a rather low sensitivity of 0.73 when compared to the study by Weikert et al., who reached a sensitivity level of 0.93 [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ]. Firstly, this can be explained by a high prevalence of chronic emboli in our study population (9/19 cases). Secondly, 6 out of 19 patients had marked artifacts (movement, beam hardening and contrast agent-related flow artifacts). After consensus review, we also found 18 false positive cases flagged by the AI tool. According to the reviewers, also mainly due to artifacts, the other cases can be clarified by a combination of artifacts, masquerading pathologies (such as infiltrate, metastatis, pleural effusion, atelectasis, fibrosis) and superposing anatomy (e.g. pulmonary vein, lymph node, hilar soft tissue, bronchus, azygos vein, pulmonary artery bifurcation). Detailed analysis of false negative and false positive PE cases by AI are reported in Table 4. Examples of false negative and false positive PE cases by AI are shown in Fig. 4. Although false positive results can increase workload, these were again easily recognized by the expert readers. Also, an AI solution with a high false-negative rate can be more harmful, especially in outpatients [
      • Ye H.
      • Gao F.
      • Yin Y.
      • et al.
      Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network.
      ,
      • Ko H.
      • Chung H.
      • Lee H.
      • Lee J.
      Feasible study on intracranial hemorrhage detection and classification using a CNN-LSTM network.
      ,

      Kallmes DF, Erickson BJ. Automated Aneurysm Detection: Emerging from the Shallow End of the Deep Learning Pool [published online ahead of print, 2020 Nov 3]. Radiology. 2020;203853. 10.1148/radiol.2020203853.

      ,
      • Annarumma M.
      • Withey S.J.
      • Bakewell R.J.
      • Pesce E.
      • Goh V.
      • Montana G.
      Automated triaging of adult chest radiographs with deep artificial neural networks.
      ]. Via Dual-energy CT with associated iodine maps for perfusion defect detection, the expert radiologists easily picked up false negative AI reports. In future updates, the AI tool should implement those iodine maps as well, to decrease false negative results [
      • Grob D.
      • Smit E.
      • Oostveen L.J.
      • et al.
      Image quality of iodine maps for pulmonary embolism: A comparison of subtraction CT and dual-energy CT [published online ahead of print, 2019 Mar 12].
      ,
      • Bouma H.
      • Sonnemans J.J.
      • Vilanova A.
      • Gerritsen F.A.
      Automatic detection of pulmonary embolism in CTA images.
      ,
      • Tajbakhsh N.
      • Shin J.Y.
      • Gotway M.B.
      • Liang J.
      Computer-aided detection and visualization of pulmonary embolism using a novel, compact, and discriminative image representation.
      ,
      • Alis J.
      • Latson Jr, L.A.
      • Haramati L.B.
      • Shmukler A.
      Navigating the pulmonary perfusion map: dual-energy computed tomography in acute pulmonary embolism.
      ]. Even though AI solution may potentially identify PE and thus assist radiologists, ultimately, we have to weigh the significance of such findings, which is for PE in most cases location-bound. Radiologists will give more significance to identifying central PE compared to missing a very small subsegmental PE [
      • Weikert T.
      • Winkel D.J.
      • Bremerich J.
      • et al.
      Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
      ,
      • Alis J.
      • Latson Jr, L.A.
      • Haramati L.B.
      • Shmukler A.
      Navigating the pulmonary perfusion map: dual-energy computed tomography in acute pulmonary embolism.
      ,
      • Wittenberg R.
      • Peters J.F.
      • Weber M.
      • et al.
      Stand-alone performance of a computer-assisted detection prototype for detection of acute pulmonary embolism: a multi-institutional comparison.
      ].
      Table 4Detailed analysis of false negative and false positive PE cases by AI.
      Detailed analysis of False Negative PE cases by AI
      Chronic emboli47,3% (9/19)
      Artifacts (Contrast agent-related Flow Artifacts, beam hardening artifacts, movement artifacts)31,6% (6/19)
      Masquerading pathology (infiltrate, metastatis, pleural effusion, atelectasis, fibrosis)15,7% (3/19)
      Superimposing anatomy5,2% (1/19)
      Detailed analysis of False Positive PE cases by AI
      Artifacts (Contrast agent-related Flow Artifacts, beam hardening artifacts, movement artifacts)33,3% (6/18)
      Masquerading pathology (infiltrate, metastatis, pleural effusion, atelectasis, fibrosis)16,6% (3/18)
      Superimposing anatomy (pulmonary vein, lymph node, hilar soft tissue, bronchus, azygos vein, pulmonary artery bifurcation)16,6% (3/18)
      Combination38,8% (7/18)
      Interestingly, while the AI tool processed more PE studies, our data showed the impact of the tool to be more sensitive for ICH (0.84 versus 0.73). Even so, the PE study scored a higher positive predictive value of 0.76 versus 0.61 for ICH. Similar results were achieved regarding specificity (0.94 for ICH and 0.94 for PE) and accuracy (0.93 for ICH and 0.98 for PE). Future studies with adjusted prevalence of each target pathology could provide more insight. To our knowledge, no study has evaluated the diagnostic accuracy of AI technology during real-time radiology work flow in detecting ICH and PE, using a consecutive dataset for each target pathology. Prior studies focused on prototype algorithms which limits their usefulness in clinical setting, while an important strength of this study is that the AI algorithm is commercially available, and has never previously been exposed to images from our department or our CT equipment [
      • Ye H.
      • Gao F.
      • Yin Y.
      • et al.
      Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network.
      ]. Also, the AI technology used in our study was applied to data from two single-energy CTs and one dual-energy CT, suggesting the robustness of AI processing. Roa et al., studied more closely retrospective peer review systems to minimize false negatives in particular, whereby this AI tool could function as a real-time prospective, peer review for radiologists [
      • Rao B.
      • Zohrabian V.
      • Cedeno P.
      • Saha A.
      • Pahade J.
      • Davis M.A.
      Utility of artificial intelligence tool as a prospective radiology peer reviewer - detection of unreported intracranial hemorrhage [published online ahead of print, 2020 Feb 24].
      ]. Earlier research evaluated on rather small data sets with a high percentage of positive cases or even exclusively positive cases, which does not represent real-time clinical workflow and might influence diagnostic accuracy [

      Ojeda P, Zawaideh M, Mossa-Basha M, et al. The utility of deep learning: evaluation of a convolutional neural network for detection of intracranial bleeds on non-contrast head computed tomography studies. SPIE Medical Imaging, 2019, Proceedings Volume 10949, Medical Imaging 2019: Image Processing; 109493J.

      ,
      • Wittenberg R.
      • Peters J.F.
      • Weber M.
      • et al.
      Stand-alone performance of a computer-assisted detection prototype for detection of acute pulmonary embolism: a multi-institutional comparison.
      ,
      • Wittenberg R.
      • Peters J.F.
      • van den Berk I.A.
      • et al.
      Computed tomography pulmonary angiography in acute pulmonary embolism: the effect of a computer-assisted detection prototype used as a concurrent reader.
      ,
      • Wu J.T.
      • Wong K.C.L.
      • Gur Y.
      • et al.
      Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents.
      ].
      Following limitations of our study merit consideration. We included emergency, in and out patients which differ markedly in diseases and pathology signs. We did not study the underlying or associated pathologies nor characteristics (gender, age, etc.) of our study population. We do not know whether these may have influenced our results. Likewise, we did not calculate the prevalence of our target pathologies at our institution. However, there is a remarkable variety geographically and it is well known that prevalence has a strong influence on PPV and NPV. This can provoke an application site-dependent performance which could lead to future replicability issues. Nevertheless, testing on all consecutive head CTs and CTPAs during a vast time frame at our department, ensured a representative clinical reality regarding the ICH and PE distribution and even the positive and negative cases. All original radiology reports included in our study, were used for making clinical decisions at our institution. As aforementioned, artifacts, sloped and even postoperative studies were included to represent routine radiology work stream. These inclusion and exclusion criteria were established to approximate a standard practice data set and could partially explain our rather lower achieved sensitivity for ICH detection. Since our gold standard was the consensus review, we still may have missed undetected cases. Also, we did not assess the follow-up imaging of patients with missed ICH/PE. Another limitation is that we did not assess the reason why some studies were not processed by the AI tool, nor did we consider the diagnostic accuracy stratified per scanner model. Future studies with a higher number of cases per scanner and a rejection analysis might provide interesting insights in the performance of AI tools in function of the scanner model and applied scan protocols including image quality and radiation dose. An additional limitation of this study relates to the retrospective methodology in a single center. Lastly, we did not assess any AI solution for the detection of intracranial pathology besides hemorrhage nor pulmonary pathology besides embolism.
      Due to the fact that we evaluate the AI tool next to an original reporting radiologist (and not without human primary and secondary review), it is hard to assess the clinical impact of the findings detected by the AI tool. Currently, the tool is integrated into our PACS as an automated triage system with a pop-up window whenever the tool suspects the presence of ICH or PE. In this way, quality increases because abnormalities are brought to our attention immediately. The automated case prioritization ensures that the most urgent patient will be diagnosed first. We did not evaluate the time saving for each patient due to the automated triage system. Future investigation should focus on the added value of worklist prioritization, which will give more information about the clinical impact as well.

      Conclusion

      Our study demonstrates that, in a diverse clinical setting, an AI solution has the potential to assist radiologists and serve as a real-time clinical adjunct for non-contrast head CT scans to rule out ICH on one hand and for CTPAs to diagnose PE on the other hand. An important fraction of consecutive studies could not be analyzed (22% ICH studies and 10.4% PE studies), however these fractions were reduced to 15.4% for ICH and 8.3% for PE when considering the two scanners that performed the bulk of the scans. Although the AI tool processed more PE studies, our data showed the impact of the tool to be more sensitive for ICH (0.84 versus 0.73). Even so, the PE study scored a higher positive predictive value of 0.73 versus 0.61 for ICH. Similar results were achieved regarding specificity (0.94 for ICH and 0.95 for PE) and accuracy (0.93 for ICH and 0.98 for PE). In conclusion, our study provides data on an AI solution acting as an adjunct to current real-time radiology workflow as a second reader of non-contrast CT’s and CTPA’s for detecting ICH and PE respectively.

      Acknowledgment

      The authors sincerely thank all radiologist and physicists of the radiology department at Universitair Ziekenhuis Brussel, for supporting the data collection and their effort to carefully verify the ground truth of the dataset.

      References

        • van Asch C.J.
        • Luitse M.J.
        • Rinkel G.J.
        • van der Tweel I.
        • Algra A.
        • Klijn C.J.
        Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis.
        Lancet Neurol. 2010; 9: 167-176https://doi.org/10.1016/S1474-4422(09)70340-0
        • Heit J.J.
        • Iv M.
        • Wintermark M.
        Imaging of intracranial hemorrhage.
        J Stroke. 2017; 19: 11-27https://doi.org/10.5853/jos.2016.00563
        • Morales H.
        Pitfalls in the imaging interpretation of intracranial hemorrhage.
        Semin Ultrasound CT MR. 2018; 39: 457-468https://doi.org/10.1053/j.sult.2018.07.001
        • Estrada-Y-Martin R.M.
        • Oldham S.A.
        CTPA as the gold standard for the diagnosis of pulmonary embolism.
        Int J Comput Assist Radiol Surg. 2011; 6: 557-563https://doi.org/10.1007/s11548-010-0526-4
        • McDonald R.J.
        • Schwartz K.M.
        • Eckel L.J.
        • et al.
        The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload.
        Acad Radiol. 2015; 22: 1191-1198https://doi.org/10.1016/j.acra.2015.05.007
        • Grob D.
        • Smit E.
        • Oostveen L.J.
        • et al.
        Image quality of iodine maps for pulmonary embolism: A comparison of subtraction CT and dual-energy CT [published online ahead of print, 2019 Mar 12].
        AJR Am J Roentgenol. 2019; 1–7https://doi.org/10.2214/AJR.18.20786
        • Brady A.P.
        Error and discrepancy in radiology: inevitable or avoidable?.
        Insights Imaging. 2017; 8: 171-182https://doi.org/10.1007/s13244-016-0534-1
        • Sokolovskaya E.
        • Shinde T.
        • Ruchman R.B.
        • et al.
        The effect of faster reporting speed for imaging studies on the number of misses and interpretation errors: A pilot study.
        J Am Coll Radiol. 2015; 12: 683-688https://doi.org/10.1016/j.jacr.2015.03.040
        • Geijer H.
        • Geijer M.
        Added value of double reading in diagnostic radiology, a systematic review.
        Insights Imaging. 2018; 9: 287-301https://doi.org/10.1007/s13244-018-0599-0
        • Muroff L.R.
        • Berlin L.
        Speed versus interpretation accuracy: Current thoughts and literature review.
        AJR Am J Roentgenol. 2019; 213: 490-492https://doi.org/10.2214/AJR.19.21290
        • Babiarz L.S.
        • Yousem D.M.
        Quality control in neuroradiology: discrepancies in image interpretation among academic neuroradiologists.
        AJNR Am J Neuroradiol. 2012; 33: 37-42https://doi.org/10.3174/ajnr.A2704
        • Arbabshirani M.R.
        • Fornwalt B.K.
        • Mongelluzzo G.J.
        • et al.
        Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration.
        NPJ Digit Med. 2018; 1 (Published 2018 Apr 4): 9https://doi.org/10.1038/s41746-017-0015-z
        • Prevedello L.M.
        • Erdal B.S.
        • Ryu J.L.
        • et al.
        Automated critical test findings identification and online notification system using artificial intelligence in imaging.
        Radiology. 2017; 285: 923-931https://doi.org/10.1148/radiol.2017162664
        • Chang P.D.
        • Kuoy E.
        • Grinband J.
        • et al.
        Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT.
        AJNR Am J Neuroradiol. 2018; 39: 1609-1616https://doi.org/10.3174/ajnr.A5742
        • Chilamkurthy S.
        • Ghosh R.
        • Tanamala S.
        • et al.
        Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.
        Lancet. 2018; 392: 2388-2396https://doi.org/10.1016/S0140-6736(18)31645-3
        • Paiva O.A.
        • Prevedello L.M.
        The potential impact of artificial intelligence in radiology.
        Radiol Bras. 2017; 50: V-VIhttps://doi.org/10.1590/0100-3984.2017.50.5e1
      1. Ojeda P, Zawaideh M, Mossa-Basha M, et al. The utility of deep learning: evaluation of a convolutional neural network for detection of intracranial bleeds on non-contrast head computed tomography studies. SPIE Medical Imaging, 2019, Proceedings Volume 10949, Medical Imaging 2019: Image Processing; 109493J.

        • Weikert T.
        • Winkel D.J.
        • Bremerich J.
        • et al.
        Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.
        Eur Radiol. 2020; 30: 6545-6553https://doi.org/10.1007/s00330-020-06998-0
        • Rao B.
        • Zohrabian V.
        • Cedeno P.
        • Saha A.
        • Pahade J.
        • Davis M.A.
        Utility of artificial intelligence tool as a prospective radiology peer reviewer - detection of unreported intracranial hemorrhage [published online ahead of print, 2020 Feb 24].
        Acad Radiol. 2020; S1076–6332: 30084-30092https://doi.org/10.1016/j.acra.2020.01.035
        • Ye H.
        • Gao F.
        • Yin Y.
        • et al.
        Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network.
        Eur Radiol. 2019; 29: 6191-6201https://doi.org/10.1007/s00330-019-06163-2
        • Ko H.
        • Chung H.
        • Lee H.
        • Lee J.
        Feasible study on intracranial hemorrhage detection and classification using a CNN-LSTM network.
        Annu Int Conf IEEE Eng Med Biol Soc. 2020; 2020: 1290-1293https://doi.org/10.1109/EMBC44109.2020.9176162
      2. Kallmes DF, Erickson BJ. Automated Aneurysm Detection: Emerging from the Shallow End of the Deep Learning Pool [published online ahead of print, 2020 Nov 3]. Radiology. 2020;203853. 10.1148/radiol.2020203853.

        • Annarumma M.
        • Withey S.J.
        • Bakewell R.J.
        • Pesce E.
        • Goh V.
        • Montana G.
        Automated triaging of adult chest radiographs with deep artificial neural networks.
        Radiology. 2019; 291 ([published correction appears in Radiology. 2019 Apr;291(1):272]): 196-202https://doi.org/10.1148/radiol.2018180921
        • Bouma H.
        • Sonnemans J.J.
        • Vilanova A.
        • Gerritsen F.A.
        Automatic detection of pulmonary embolism in CTA images.
        IEEE Trans Med Imaging. 2009; 28: 1223-1230https://doi.org/10.1109/TMI.2009.2013618
        • Tajbakhsh N.
        • Shin J.Y.
        • Gotway M.B.
        • Liang J.
        Computer-aided detection and visualization of pulmonary embolism using a novel, compact, and discriminative image representation.
        Med Image Anal. 2019; 58101541https://doi.org/10.1016/j.media.2019.101541
        • Alis J.
        • Latson Jr, L.A.
        • Haramati L.B.
        • Shmukler A.
        Navigating the pulmonary perfusion map: dual-energy computed tomography in acute pulmonary embolism.
        J Comput Assist Tomogr. 2018; 42: 840-849https://doi.org/10.1097/RCT.0000000000000801
        • Wittenberg R.
        • Peters J.F.
        • Weber M.
        • et al.
        Stand-alone performance of a computer-assisted detection prototype for detection of acute pulmonary embolism: a multi-institutional comparison.
        Br J Radiol. 2012; 85: 758-764https://doi.org/10.1259/bjr/26769569
        • Wittenberg R.
        • Peters J.F.
        • van den Berk I.A.
        • et al.
        Computed tomography pulmonary angiography in acute pulmonary embolism: the effect of a computer-assisted detection prototype used as a concurrent reader.
        J Thorac Imaging. 2013; 28: 315-321https://doi.org/10.1097/RTI.0b013e3182870b97
        • Wu J.T.
        • Wong K.C.L.
        • Gur Y.
        • et al.
        Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents.
        JAMA Netw Open. 2020; 3 (Published 2020 Oct 1): e2022779https://doi.org/10.1001/jamanetworkopen.2020.22779