Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.


Oxford University Lead: Derrick Crook

PHE co-lead: Saheer Garbia

We aim to determine how to analyse and compare the genetic code of millions of microbes causing infections from across the world. Our vision is to find better ways to manage and prevent threats from antimicrobial resistance and healthcare associated infections by detecting them faster, working out who needs protecting most and how we can do this.  We will develop automated software to read, interpret and report results from genetic code identifying the type of microbe, whether it is resistant to certain antimicrobials and whether there is an. We will also improve our workflow to obtain the pathogen DNA/RNA (and thus genetic code) directly from clinical samples and our understanding of how the genetic changes leads to antimicrobial resistance.


Sequencing Theme 

Project title 15:

Refine SP3 to deliver faster, more streamlined solutions for more pathogens  

Project start: April 2020 

Project Leads:  PHE: Saheer Gharbia and Richard Myers; OU lead: Fan Yang-Turner; EBI lead: Zamin Iqbal Researchers: OU: Stephen Bush, Dennis Volk, Philip Fowler, PHE: bioinformatician (vacant), EBI: Martin Hunt 

We have a prototype for processing mycobacterial sequences ready for extensive testing. This prototype will also be reused with key modifications to process sequences for Clostridium difficile and Listeria monocytogenes. This will be undertaken sequentially as follows: 

  • Processing of mycobacterial sequences will be developed to the point where it is ready for a parallel run with existing pathogen processing pipelines within Oxford and PHE,  and an implementation plan will be developed. Key modules to test are upload of data, data processing through species identification, variant calling, resistance prediction and nearest neighbourhood identification. Seek extension to scope for UKAS accreditation.  
  • A workflow for processing Clostridium difficile genome sequences. Key components beyond those for mycobacteria will be a core genome multilocus sequence typing (cgMLST) module, automatic fine-scale phylogenetic analysis of clusters, and resistance prediction. The workflow will be tested and prepared for implementation in support of national C. difficile surveillance.  
  • A workflow for processing Listeria monocytogenes will be developed to extend the current Clonal Complexes and single nucleotide polymorphism (SNP) based strain comparison to integrate environmentally regulated virulence determinants to support tracking longitudinal colonisation of food processing and healthcare facilities.     
  • Develop data “truth sets” for verification and validation by users of the above three workflows for user acceptance testing for roll-out and testing post implementation of new versions of the software. 

Key milestones and timescales: 

April 2021 - Parallel run of mycobacterial SP3 service 

July 2021 – Validation run of C. difficile SP3 service 

March 2022 – Parallel run of L. monocytogenes SP3 service. 

April 2021 – March 2022 establish the relevant truth sets and complete acceptance testing respectively for the 3 pathogen specific processing services. 


Project title 16:

Improve genomic data processing and storage  

EBI lead: Zamin Iqbal. Researcher: Martin Hunt 

Project start: April 2020 

We will improve information organisation using genome graphs, allowing reference-free, rapid identification of genetic relatedness and AMR prediction across vast datasets. New variant calling algorithms for Illumina and Oxford Nanopore reads will be developed and tested, offering the prospect of fast, easy-to-use approaches for processing data with minimal computational resources.  

Key milestones and timescales: 

March 2021 – submit publication on novel graph based variant calling 

March 2022 – submit publication on graph-based outbreak detection 


Project Title 17:

Refine AMR prediction for M. tuberculosis (TB), and Escherichia coli 

OU leads: Derrick Crook, Nicole Stoesser, David Eyre, David Clifton; PHE leads: Saheer Gharbia, Richard Myers; EBI lead: Zamin Iqbal. Researchers: Martin Hunt, Philip Fowler, Tim Davies, Bede Constantinides, Sam Lipworth, PHE vacancy bioinformatics. 

  • TB AMR prediction to all antituberculosis drugs will be substantially enhanced through linked work of the CRyPTIC consortium ( and UNITAID-funded project.  
  • Using large phenotypically- and genomically-characterised PHE/OU datasets, AMR prediction for E. coli will be refined, validated and implemented, initially to surveillance standards, by improving prediction algorithms, further discovery  of variants conferring resistance using genome wide association studies (GWAS) and machine learning, and collaborations with developers of the most widely used, publicly available, AMR databases (i.e. CARD, ResFinder and NCBI’s NDARO resource).  

Key milestones and timescales: 

March 2021 – submit publication on catalogue on variation conferring TB resistance 

July 2021 – submit publications on detailed analysis of TB genomic variation conferring resistance to antituberculosis drugs 

March 2022 – establish software based automated resistance prediction to surveillance standards for E. coli 


Project title 18:

Scale services for identifying nearest neighbours for outbreak detection and transmission tracking.  

PHE lead: David Wyllie, OU lead: David Eyre, EBI lead: Zamin Iqbal. Researchers: OU: Stephen Bush; PHE: bioinformatician vacant; EBI: Martin Hunt 

Currently PHE services use in-RAM reference-based compression approaches that scale to < 100,000 genomes. We will evaluate approaches using enhanced in-RAM reference-based approaches rapidly to identify nearest neighbours. In addition, automated stepwise workflows scaling to millions of genomes will be investigated. These may include pre-categorisation, e.g. using automated cgMLST, genome graphs (as described above) approaches, or other promising emerging enhancements. 

Key milestones and timescales: 

March 2021 – complete evaluation of cgMLST evaluate approaches using enhanced in-RAM reference-based approaches rapidly to identify nearest neighbours for >100000 TB genomes 

July 2021 – complete evaluation of cgMLST for outbreak detection for C. difficile 

March 2022 – evaluate graph-based approaches for enhancing outbreak detection from many hundreds of thousands of genomes. 


Project title 19: *New Project* COVID Lampore diagnostics validation and implementation 


Sequencing Theme Populations


  • Evaluation of methods for detecting human reads in microbial sequencing datasets

Bush S, Connor T, Peto T, Crook D, Walker A


  • Read trimming has minimal effect on bacterial SNP-calling accuracy

Bush SJ


  • The importance of using whole genome sequencing and extended spectrum beta-lactamase selective media when monitoring antimicrobial resistance

Duggett N, AbuOun M, Randall L, Horton R, Lemma F, Rogers J, Crook D, Teale C, Anjum M



  • Utility of whole genome sequencing in assessing and enhancing partner notification of Neisseria gonorrhoeae infection

Kong L, Wilson J, Moura I, Fawley W, Kelly L, Walker, Eyre D, Wilcox M

Please contact to receive a copy of this publication


  • Diagnosis of SARS-CoV-2 infection with LamPORE, a high-throughput platform combining loop-mediated isothermal amplification and nanopore sequencing

Peto L, Rodger G, Carter D, Osman K, Yavuz M, Johnson K, Raza M, Parker M, Wyles M, Andersson M, Justice A, Vaughan A, Hoosdally S, Stoesser N, Matthews P, Eyre D, Peto T, Carroll M, De Silva T, Crook D, Evans C, Pullan S