Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

<jats:p>Read alignment is the central step of many analytic pipelines that perform SNP calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as 'trimming'. Trimming is widely assumed to increase the accuracy of SNP calling although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporate thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study evaluated the impact of four read trimming utilities (Atropos, fastp, Trim Galore, and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP calling pipelines. We found that read trimming made only small, and statistically insignificant, increases in SNP calling accuracy even when using the highest-performing pre-processor, fastp. To extend these findings, we re-analysed &gt; 6500 publicly-archived sequencing datasets from E. coli, M. tuberculosis and S. aureus. Of the approximately 125 million SNPs called across all samples, the same bases were called in 98.8% of cases, irrespective of whether raw reads or trimmed reads were used. However, when using trimmed reads, the proportion of non-homozygous calls (a proxy of false positives) was significantly reduced by approximately 1%. This suggests that trimming rarely alters the set of variant bases called but can affect their level of support. We conclude that read quality- and adapter-trimming add relatively little value to a SNP calling pipeline and may only be necessary if small differences in the absolute number of SNP calls are critical. Read trimming remains routinely performed prior to SNP calling likely out of concern that to do otherwise would substantially increase the number of false positive calls. While historically this may have been the case, our data suggests this concern is now unfounded.</jats:p>

Original publication

DOI

10.1101/2020.08.04.236216

Type

Journal article

Publisher

Cold Spring Harbor Laboratory

Publication Date

05/08/2020