Authors (including presenting author) :
Chan CKR (1), Palmerston JB (2), Garifullin A (2), Wan JCH (2), So B (2), CHAN MHH (3)
Affiliation :
(1) Department of Pathology, Alice Ho Miu Ling Nethersole Hospital, (2) Artificial Intelligence Systems Team 1: AI Lab and Innovation, Information Technology and Health Informatics Division, (3) Department of Chemical Pathology
Prince of Wales Hospital
Keyword 1: :
histopathology
Keyword 2: :
artificial intelligence
Keyword 3: :
image analysis
Keyword 4: :
gastric biopsy
Introduction :
Gastric biopsies are one of the most common samples processed in a histology laboratory, accounting for close to half of all biopsies received. Interpretation of these specimens is a resource-intensive manual process by pathologists. More than half of the gastric biopsies are normal, with a minority showing clinically significant abnormalities, i.e. cancer, intestinal metaplasia and Helicobactor infection. An automatic triage system has the potential to improve the diagnostic process, reducing pathologists’ burden and turn-around time. This study presents a computer-vision-based artificial intelligence (AI) system designed to detect abnormalities in whole-slide images (WSIs) of gastric biopsies.
Objectives :
To develop and evaluate an AI system to reduce workload for expert pathologists by detecting abnormalities in gastric WSIs in three categories: abnormal tissue, Helicobacter infection (HPACG), and intestinal metaplasia.
Methodology :
The study utilized a multi-stage AI pipeline. The clustering-constrained-attention multiple-instance learning (CLAM) method is utilized to process WSIs into image patches. MedSigLip, a large pre-trained medical imaging foundation model, was used to encode features from these patches. CLAM multi-instance learning (MIL) framework aggregates these patch features to classify each WSI in three independent binary classification tasks. The system was trained on 2,542 de-identified WSIs and evaluated on a subset via 10-fold cross-validation in independent groups per classification task. Three classification tasks were evaluated: normal vs abnormal, Helicobacter infection (HPACG), and intestinal metaplasia (IM). Classification of metrics including sensitivity, true negative rate (TNR), precision, negative predictive value (NPV), were computed from a standard model confidence threshold of 0.5, along with the threshold-independent area under the ROC curve (AUC).
Result & Outcome :
Evaluation of the trained system on binary classification tasks yielded the following macro-averaged performance metrics: Sensitivity ranged from 0.69 (normal vs abnormal) to 0.76 (intestinal metaplasia); TNR ranged from 0.85 (abnormal vs normal) to 0.96 (HPACG); NPV ranged from 0.88 (abnormal vs normal) to 0.97 (HPACG, Intestinal Metaplasia); Precision (PPV) ranged from 0.55 (Intestinal Metaplasia) to 0.63 (normal vs abnormal); and AUC ranged from 0.84 (normal vs abnormal) to 0.92 (HPACG). These results indicate promising proof-of concept performance.