An antibiotic chatbot: Evaluation of a retrieval-augmented generation approach for providing guideline-based antimicrobial advice.

Eyre DW., Corrigan R., Hookham L., Lumley S., Clarke D., Jeffery K., Dunsmure L., Jones N.

BackgroundLarge language models (LLMs) have potential to provide clinical infection advice, but variations in prevalent pathogens and antimicrobial resistance requires models to be adapted to local contexts. We evaluated a retrieval-augmented generation (RAG) approach to provide antibiotic and infection advice explicitly constrained to local guidelines.MethodsRelevant guideline sections from Oxford University Hospitals were identified combining keyword-matching and a medical embedding model. A locally-deployed LLM (gpt-oss-20b) generated answers using the retrieved context. Performance was assessed using 200 simulated questions with an LLM-as-judge, and 66 human-written questions reviewed by ≥2 infection specialists.ResultsThe model attempted to answer 186/200 (93%) simulated clinical advice queries, of which 162 (87%) responses were judged fully-correct, 14 (8%) partially-correct, and 10 (5%) incorrect. Performance was lower in complex scenarios, e.g., when renal impairment was present. For 57 human-written questions covered by guidelines, 46 (81%) single-stage responses were fully-correct and 10 (18%) partially-correct. Of 9 out-of-scope questions, 5 (56%) were correctly identified. A multi-stage pipeline modestly improved performance (84% fully-correct). Median answer generation time was 12 s (single-stage) and 15 s (multi-stage). LLMs without RAG-based local guideline context had lower performance: 21/186 (11%) answers to simulated questions fully correct with the same locally-deployed LLM and 92/200 (46%) with a current frontier model (gpt-5.4).ConclusionAn LLM grounded in local antimicrobial guidelines can deliver mostly accurate, concise infection advice but still generates occasional errors and does not always recognise out-of-scope queries. Further optimisation and safety mechanisms are required before routine clinical deployment.

DOI

10.1016/j.jinf.2026.106789

Type

Journal article

Publication Date

2026-06-01T00:00:00+00:00

Volume

93

Addresses

Big Data Institute, University of Oxford, Oxford, UK; National Institute for Health Research Health Protection Research Unit in Antimicrobial Resistance and Healthcare Associated Infection, University of Oxford, Oxford, UK; National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK; Oxford University Hospitals, Oxford, UK; Nuffield Department of Medicine, University of Oxford, Oxford, UK. Electronic address: david.eyre@bdi.ox.ac.uk.

Permalink More information Close