Authors (including presenting author) :
Ng KH (1)(2), Yeung JHH (1)(2), Lee SF (1), Hung CC (2), Granham CA (2)
Affiliation :
(1) Accident and Emergency Department, Princess Margaret Hospital, Hong Kong (2) Accident and Emergency Medicine Academic Unit, The Chinese University of Hong Kong, Hong Kong
Keyword 1: :
Artificial intelligence
Keyword 3: :
Large Language Model
Keyword 5: :
Emergency department
Introduction :
The growing demand for emergency services highlighted the necessity for innovative approaches to expedite patient management. The introduction of artificial intelligence (AI) powered tools creates a possibility for the improvement of emergency triage efficiency.
Objectives :
This study examined the accuracy of ChatGPT, as a baseline untrained LLM, in predicting emergency triage categories compared with frontline ED nurses, based on local emergency department guidelines in Hong Kong.
Methodology :
This study is a single-center, retrospective analysis of 235 triage cases sampled across a year. We evaluated the concordance of triage decisions between the triage team and the audit group, and between GPT-4 and the audit group, using quadratic weighted kappa (QWK) . The overall accuracy between groups was assessed by the McNemar test.
Result & Outcome :
The level of agreement of triage decision between the triage team and the audit group achieved a QWK 0.922 (95% CI., 0.897 to 0.947), which was significantly superior to that of GPT-4, QWK 0.847 (95% CI., 0.659 to 0.778). The rate of appropriate triage in GPT-4 was 69.4%, which was significantly lower than that of nurses’ triage (p< 0.001). This study found GPT-4 obtained an almost perfect agreement with the gold standard. However, the performance was inferior to frontline nurses in the emergency triage of ED patients. Prospective studies to compare trained LLM models versus frontline nurses should be considered in ED settings.