Accuracy of radiologists and radiology residents in detection of paediatric appendicular fractures with and without artificial intelligence

[ad_1]

Methods

Randomised paediatric patients between the ages of 2 and 15 years who presented to the Children’s emergency at our tertiary general hospital from November 2022 to February 2023 and had limb or pelvic radiographs taken were included in the study. The data were retrospectively collected (figure 1). Based on a local population trends paper, our tertiary hospital services a good distribution of the Singapore population comprising one of the neighbourhoods with the highest proportion of children below the age of 5.7 No patient history or study indication was obtained as part of institutional review board (IRB) waiver guidelines.

Study design. AI, artificial intelligence.

The radiographs included in this study are from the appendicular skeleton (pelvis and limbs) which the AI solution, RBfracture (Radiobotics, Denmark), is certified to analyse and excluded radiographs of the axial skeleton (spine, rib and craniofacial fractures). The radiographs were obtained from the emergency department of a tertiary general hospital which has a children’s emergency run by a team of doctors, including a senior paediatrician.

Radiographs of both orthogonal views of the pelvis or limb, if available, were extracted from the picture archiving and communication system (PACS) system. The images were extracted in an anonymised Digital Imaging and Communications in Medicine (DICOM) format, which is the international standard to transmit, store, retrieve, print, process and display medical imaging information, allowing for ‘lossless’ file decompression and transmission and stored locally on an on-site desktop-based server, on which both AI solution processing and DICOM image viewing was performed.

We used CARPL.AI (CARPL.AI, USA) AI orchestrator platform which has test and validation deployment functions as well as DICOM viewer functions for the purposes of this study.

AI assessment

RBfracture, developed by Radiobiotics, a company founded in 2017 and headquartered in Copenhagen, Denmark, which focuses on developing software that augments the reading of musculoskeletal radiographs through AI & machine learning. RBfracture has obtained a CE mark under MDR (European Medical Device Regulations) as a class IIa medical device in 2022, providing clearance for the sale of the product in the European Union for clinical use. The RBfracture suite was trained on more than 100 000 patient cases, equivalent to 300 000 X-rays from 1300 facilities in the USA and Europe. It takes DICOM radiographs as input and provides DICOM annotated radiograph outputs with bounding boxes of detected fractures, and a conclusion that displays whether a fracture is present or absent, along with a confidence score in percentage (online supplemental data 1–4). Since 2023, RBfracture V.2, which is the current iteration, has also received accreditation for the interpretation of limb, pelvic and rib fractures for ages two and above. One pilot study has shown a 94% accuracy in RB fracture’s standalone performance and an 86% reduction in missed fracture rate.8

This AI solution was chosen as it is the only commercially available solution at present that has received certification for the use in fracture detection in paediatric patients for the appendicular skeleton (including pelvis).9

Radiologist readers

Three associate consultants (AC)/junior consultants and three senior residents/senior registrars (SR) in diagnostic radiology training were recruited as independent readers for these 500 cases. Among these, one AC and one SR were randomly selected to read the radiographs with the aid of the AI solution. All readers were accredited as Fellows of The Royal College of Radiologists (UK). All readers have completed at least 3 months of training in paediatric radiology, the minimum requirements as part of the diagnostic radiology residency training programme in Singapore. SRs have undergone a total of 3–4 years of formal residency training, while ACs have completed 5 years of local residency training subsequent to residency exit. In our institution, SRs are deemed to have satisfied minimum training requirements and are accorded the right to independently report and verify paediatric radiographs, hence they were chosen for the purpose of this study.

All readers were informed to fill in their interpretations (normal vs abnormal and to specify the site of pathology if abnormal), as well as the time taken to complete the interpretation of each batch of 50 radiographs, on an online data collection form. Readers were allowed to perform the image interpretation at the designated workstation at their own convenience.

The cases were categorised into concordant and discordant cases based on results between the AI solution and human readers to determine the gold standard. Cases were deemed to be concordant when all humans (with or without the aid of AI) and AI detected a fracture in the same anatomical region on a radiograph. Discordant cases which included both human versus AI and human versus human differences were then arbitrated by two independent radiology consultants of musculoskeletal and body subspecialty, and finally by a senior consultant of musculoskeletal subspecialty. These consultants have between 6 and 14 years of experience in radiology reporting.

Analyses generated by the AI solution that aligned with our established gold standard were categorised into true positives or negatives. Radiographs in which the abnormality was not picked up were designated as false negatives. Radiographs deemed normal by human assessment but were flagged as abnormal by AI were classified as false positives. In instances where multiple abnormalities are present, the AI solution’s result was deemed to be accurate if any one correct abnormality was picked up, given that the detection of any fracture in the clinical setting would entail further management and follow-up.

The accuracy of each group of readers was tabulated, evaluating for sensitivity, specificity, positive predictive value and negative predictive value. Results were then plotted onto receiver operative characteristic (ROC) curves. The area under the curve (AUC) of each group was then tabulated.

Statistical analysis

For the purposes of this study, sensitivities, specificities, predictive values and accuracy were tabulated. ROC, AUC and CIs were generated using IBM SPSS software V.29.0.2.0.

[ad_2]

Source link

Accuracy of radiologists and radiology residents in detection of paediatric appendicular fractures with and without artificial intelligence