Evaluating ChatGPT-4 for Emergency Department Triage in Hong Kong: A Retrospective Study

This abstract has open access

Abstract Description

Submission ID :

HAC10

Submission Type

HA Staff

Authors (including presenting author) :

Ng KH (1)(2), Yeung JHH (1)(2), Lee SF (1), Hung CC (2), Granham CA (2)

Affiliation :

(1) Accident and Emergency Department, Princess Margaret Hospital, Hong Kong (2) Accident and Emergency Medicine Academic Unit, The Chinese University of Hong Kong, Hong Kong

Keyword 1: :

Artificial intelligence

Keyword 2: :

AI triage

Keyword 3: :

Large Language Model

Keyword 4: :

ChatGPT

Keyword 5: :

Emergency department

Introduction :

The growing demand for emergency services highlighted the necessity for innovative approaches to expedite patient management. The introduction of artificial intelligence (AI) powered tools creates a possibility for the improvement of emergency triage efficiency.

Objectives :

This study examined the accuracy of ChatGPT, as a baseline untrained LLM, in predicting emergency triage categories compared with frontline ED nurses, based on local emergency department guidelines in Hong Kong.

Methodology :

This study is a single-center, retrospective analysis of 235 triage cases sampled across a year. We evaluated the concordance of triage decisions between the triage team and the audit group, and between GPT-4 and the audit group, using quadratic weighted kappa (QWK) . The overall accuracy between groups was assessed by the McNemar test.

Result & Outcome :

The level of agreement of triage decision between the triage team and the audit group achieved a QWK 0.922 (95% CI., 0.897 to 0.947), which was significantly superior to that of GPT-4, QWK 0.847 (95% CI., 0.659 to 0.778). The rate of appropriate triage in GPT-4 was 69.4%, which was significantly lower than that of nurses’ triage (p< 0.001). This study found GPT-4 obtained an almost perfect agreement with the gold standard. However, the performance was inferior to frontline nurses in the emergency triage of ED patients. Prospective studies to compare trained LLM models versus frontline nurses should be considered in ED settings.